In this research work, the EGA method is used for discovering the pathways in PPI networks. The set of unaligned protein interactions is given as an input to the proposed and the existing algorithms. The input protein interactions are of the same length. The pathway analysis is achieved using the proposed and the existing algorithms such as GA, ROLS, MAX-CSP, MIN-SAT and EGA. The proposed algorithm is very efficient to provide a better solution and discover the pathways in PPI when compared to other algorithms. Figure 1. represents the overall framework for the proposed EGA. The Fig. 1(a) shows the key steps tangled in the proposed EGA algorithm. The pseudocode of the proposed EGA is shown in Fig. 2
Weight Calculation
The following parameters are used to calculate the weight of a protein interaction (Fig. 1(c)).
Protein Weight
The protein weight is derived from the total number of individual protein interaction with a particular protein from the total number of proteins in a network is given in Eq. (3).
Protein Weight= \(\frac{\text{T}\text{o}\text{t}\text{a}\text{l} \text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{i}\text{n}\text{t}\text{e}\text{r}\text{a}\text{c}\text{t}\text{e}\text{d} \text{p}\text{r}\text{o}\text{t}\text{e}\text{i}\text{n}\text{s}}{\text{T}\text{o}\text{t}\text{a}\text{l} \text{n}\text{u}\text{m}\text{b}\text{e}\text{r}\text{s} \text{o}\text{f} \text{p}\text{r}\text{o}\text{t}\text{e}\text{i}\text{n}\text{s}}\) (3)
Edge Weight
The edge weight of an edge (Prot1, Prot2) is calculated using the probability of interacting protein pairs Prot1 and Prot2 which is shown in Eq. (4),
P (interact (Prot1, Prot2)) = 1- \(\prod _{\text{i}\in \text{P}\text{r}\text{o}\text{t}1,\text{P}\text{r}\text{o}\text{t}2}(1-\text{x}\left(\text{i}\right)\)) (4)
Where, i - member of set IProt1, Prot2; x(i)- reliability of experimental type i (reliability value 0.6)
Seed Generation
It is used to generate random paths in Protein-Protein Interaction network (Fig. 1(d)). After generating, the path weight is calculated using Eq. (5)
w(P) = \(\prod _{\text{v}\in \text{P}}\text{w}\left(\text{n}\right)\) * \(\prod _{\text{e}\in \text{P}}\text{w}\left(\text{e}\right)\) (5)
Where,
w(n) – weight of the node or vertex, w(e) – weight of the edge
Selection Phase
In selection phase (Fig. 1(e)), the individuals are sorted in the mating pool based on their fitness and randomly choose all the two best individuals in the current population. The fitness value is calculated according to the path weight.
Crossover Phase
A single point crossover is used to create a new child from the parent (Fig. 1(f)). The hamming distance is used to calculate the minimum distance. In a Single point crossover both parents' organism strings are chosen in which each data beyond single point is considered as organism string and swapped among the two parent organisms.
Mutation Phase
With the better path, the mutation operation is accomplished to produce new children, which perform modifications to deliver the possible variance for the offspring (Fig. 1(g)). The bits are randomly selected and modify its state to the opposite state.
Enhanced Genetic Algorithm
The proposed work describes an EGA for discovering the pathways in PPI networks. GA algorithm can perform well only when the population has enough diversity. If the population for an optimized problem converges too early, the results will be suboptimal as it cannot generate offspring’s that are superior to their parents and hence will be hard to find an optimal solution. So, in order to overwhelm such drawbacks a hamming distance is proposed in this work to calculate accurate distance between chromosomes in order to achieve population diversity and it is given in Eq. (6).
Hamming Distance = dij =\(\frac{h\left({p}_{i-pj}\right)}{l}\) (6)
l is the length of chromosome (path length), h is the hamming distance