Figure 3 depict the overall workflow of proposed model in which the system call is collected from the KDD cup 99 dataset are given for the pre-processing stage. Here all the system calls are collected in raw fashion, by using the sliding window mechanism these are pre-processed and are passed for feature selection in which for better optimization we use Particle Swarm Optimization and then passed over to the decision network were using the selected features, neural network (BPN) will process these features into several layer and give the result as system behaves properly or not.
3.1 Dataset Description
For this model for execution, we utilized KDD cup 99 dataset created in MIT Lincoln Laboratories. The Dataset is made by presenting physically produced network-based assaults. Different attacks that can be potentially found in an organization is characterized in a brief form concerning KDD interruption discovery evaluation dataset[6]. System will be analyzed at different levels such as system accuracy, system ability and cost to differentiate abnormal behavior and normal behavior. IDS can work based on either privileged process behavior or user behavior. The privileged processes have the privileges to access and use system resources. All the normal system calls are gathered in the normal trace step. In abnormal trace step, abnormal system call sequences are gathered. Data set of KDD cup 99 is utilized for collecting abnormal and normal traces. The stide, xlock, ps and login processes sequences of system calls are collected from KDD cup 99 data set. The repeated execution of these processes generates system call sequences which are recorded in separate files. Each trace system call sequence contains ten to thousand system calls. These traces are collected while there are no malicious activities. The examples of abnormal processes are iprcp, buffer overflow, sun sendmailcp etc.
Another example is Syslog attack. It uses the interfaces like syslog which makes buffer overflow in send mail. Intrusion traces contains three sunsendmailcp attacks, forwarding loops five error conditions, two traces of syslog-local attacks, the syslog-remote attacks of two traces, and an attacks of decode two traces. Each trace contains two attributes: process ID and a system call value. The process ID is used to identify the specific system call. An abnormal process will not have the sequences of normal system calls (Figure 4). The current sequence of system calls can be compared with the sequence of normal system calls stored and deviations can be detected[7][8].
3.2 Data Pre-processing
After collecting the system call sequences of from the active process, the next step is preprocessing of data. The gathered information about system call is basic raw collection data. The techniques used for preprocessing have to be applied on raw data to make the data set into processing dataset. A unique number will be assigned to each and every system call name. For instance, 8 for open, 9 for close, 74 for mmap etc. The unique numbering will make it is easy to access the system call, reduces data complexity and convenient format for processing. With proper sliding window mechanism, long system call sequence numbers can be processed. The normal behavioral data base uses the window size of 3. For example, the normal behavior database can be created from the following system call sequence Open, read, mmap, mmap, open, read, close for the given sequence, the system calls will be put in position 1, position 2 and position 3 as shown in below table. The window size decides the pairs generated. Table 1 depict the sequence of system call in proposed system.
Table 1
Current
|
Position1
|
Position2
|
Position3
|
Open1
|
Read1
|
mmap1
|
mmap1
|
Read1
|
mmap1
|
mmap1
|
|
mmap1
|
mmap1
|
|
|
Read1
|
mmap1
|
mmap1
|
Open1
|
mmap1
|
mmap1
|
Open1
|
|
mmap1
|
Open1
|
|
|
mmap1
|
mmap1
|
Open1
|
Read1
|
mmap1
|
Open1
|
Read1
|
|
mmap1
|
Open1
|
Read1
|
Close1
|
Open1
|
Read1
|
Close1
|
|
Read1
|
Close1
|
|
|
By analyzing the data set it is found that certain system calls are executed frequently. These systems call executions may be followed by different system calls. For example, the read is followed by different system calls and executed two times. Therefore, all system calls are recorded first and then, expanded the database for different sequences. The expanded format is given in the following Table 2.
Table 2
Current
|
Position1
|
Position2
|
Position3
|
Open1
|
Read1
|
mmap1, close1
|
mmap1
|
Read1
|
mmap1, close1
|
Mmap1
|
Open1
|
mmap1
|
mmap1, open1
|
Open1, read1
|
Close1, read1
|
Using the sliding window, many system call sequences are produced and stored in database. After data base is preprocessed from raw information, normal behavior rule can be easily formed from this data set.
3.3 Particle Swarm Optimization
Here once it is normalized, these are now passed over to feature selection process where you naturally or physically select those highlights which contribute most to your expectation variable or yield in which you are keen on. So, for that PSO is used here for feature extraction. PSO is an equal estimation, which has the advantages of straightforward execution, high exactness and quick assembly[19]. To track down the best arrangement, PSO instates some irregular arrangements in arrangement space, these arrangements are a few particles, where characterize the molecule speed vi and the molecule position xi. In the meantime, use the capacity of health to determine if the circumstances of the particles are ideal, use pbest and gbest to capture the individual best circumstances and social opportunity independently. For every particle, note its well-being, it will also be pbest if it is better contrasted with pbest, and it will be like gbest expect better contrasted with gbest, update the speed and location of the molecule. The speed of molecules and position update rules are according to the accompaniment;
$${v}_{i}=w{v}_{i}+{c}_{1}\times rand\left(\right)\times \left({ pbest }_{i}-{x}_{i}\right)+{c}_{2}\times rand\left(\right)\times ({ gbest }_{i}-{x}_{i}) {x}_{i}={x}_{i}+{v}_{i}$$
1
where vi is the speed of the molecule, w is inactivity weight, rand() is an irregular worth somewhere in the range of 0 and c1 and c2 is the current situation of the molecule c1 and c2 are speed increase factor. If the velocity or circumstance of the particles exceeds the degree of stroke, it will be defined as the most limiting velocity or the circumstance of the cutoff. At the point where the molecule has been reinvigorated, it will keep reheating until the best game plan is found. Regularly finding the best position or appearing at the most remarkable number of cycles will halt the demand. In BPN, the number of concealed core layer points affects the generation of the independent learning stage and the fine-tuning of coordinated learning stage. Along these lines, the quantity of covered up layer hubs in the profound adapting should be enhanced by PSO calculation to improve the exhibition of the organization.
3.4 Back Propagation Network
Learn an example for back propagation network. When you provide examples of networking algorithm and it changes the weight of the network, for a particular input the required output will given when completed the training. For simple pattern identification and mapping tasks Back Propagation Networks can be used. As mentioned now, you need to give a particular input in order to get the desired output. It is mentioned in Figure 5.
If it is the first pattern to the network, we would like the output to be 0 1 as shown in Figure 6. (yellow line =1 and black= 0 like previous examples). Training Pair means the input and its corresponding target.
Once the network is trained, it will provide the desired output for any of the input patterns.
If the network is trained once then it will yield the required output to any input. Now let's see how it goes. All weights must first initialize the network by giving small random numbers (between -1 and 1). Now you need to perform the forward pass (give the input and calculate the output). The calculations will give you a different output than you need (the target), all weights will be random. Then each neuron’s error is calculated. Here, Actual Output = Target (i.e. what you actually get - what you want). The error get from the output is then used for changing the weight. Then we can reduce the error part. By this way each neuron’s output will get nearer to required output value. It is known as reverse pass. This step is iterated until the error is minimal.
Algorithm1: BPN
1. Give the information and take the yield from the organization. Since the main weight is irregular numbers, hence recollect that the principal yield can be anything.
2. Presently we need to address the blunder of neuron B. Blunder is the thing that you need – What you really get, all in all: ErrorB = OutputB (1-OutputB) (TargetB – OutputB) The "Yield (1-Output)" term is fundamental in the condition due to the Sigmoid Function – in the event that we were just utilizing a limit neuron it would simply be (Target - Output).
3. Presently you need to alter the weight. Let WAB = introductory weight and W+AB = trained (new) weight. W+AB = WAB + (ErrorB x OutputA). Notice that it is the yield of the interfacing (neuron A) we use (not B). This is the way we update every one of the heaps on the yield layer.
4. Figure the Errors for the secret layer neurons We can't ascertain it straightforwardly from the yield layer since we don't have an objective. That is the reason this calculation is known by that name). This is finished by taking mistakes from the need yield neurons and by means of the heap, running them back to get errors of covered up layer. For model assuming neuron An is associated as to B and C as appeared, for creating a blunder for A we need take the blunders from B and C. ErrorA = Output A (1 - Output A) (ErrorC WAC + ErrorB WAB) Again, the factor "Yield (1 - Output)" is available due to the sigmoid crushing capacity.
5. Assuming you get the mistake for the hidden layer neurons once, the subsequent stage to change the secret layer weight. Consequently we can rehash this strategy and make it workable for quite a few layered networks. Once in a while there might be questions about its capacity. It shows the estimation of a FCN. It comprises, number of data sources =2, covered up layer neurons = 3 and yield = 2. Where w + = new and recalculated weight, w (without addendum) =old weight. The opposite interaction can be determined similarly.