­­­An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks

doi:10.21203/rs.3.rs-1427073/v1

Download PDF

Research Article

An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks

https://doi.org/10.21203/rs.3.rs-1427073/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Biological pathway plays a significant role in understanding evolution and cell activities of any organism. For finding the pathways in PPI networks, it is important to orient Protein-Protein Interaction (PPI) that will be in the forms of undirected networks. It indicates that orienting protein interactions can enhance the pathway discovery process. To overcome the drawbacks in the existing algorithms, an Enhanced Genetic Algorithm (EGA) has been proposed to reduce the unnecessary edges and discover the pathways in PPI networks. The experimental results of the proposed and the existing algorithms such as Genetic Algorithm (GA), Random Orientation Algorithm plus Local Search (ROLS), Maximum Constraint Satisfaction (MAX-CSP), Minimum Satisfiability (MIN-SAT) were compared. The experiments are carried out using BioGRID databases and it is inferred that the proposed enhanced genetic algorithm has achieved better results in addressing this problem compared to other existing techniques. Also, it is inferred that the proposed EGA technique performs better in terms of execution, fitness function and specifically in matching gold standard pathways.

Protein-Protein Interaction

Enhanced Genetic Algorithm

Signalling Pathways

KEGG

BioGRID3.4.154

BioGRID 2.0.50

A Protein-Protein Interaction connects two or more proteins required together regularly to perform their biological function [1]. PPIs plays key roles in numerous cellular functions, such as DNA translation, metabolic cycles, and signalling cascades [2]. A PPI network is a collection of protein interactions, often deposited in an online database, PPI network is represented as a network where a node is denoted to a protein and an edge is denoted to the interaction between a pair of proteins. The PPI database is the wellspring of protein interaction data in biological cells [3]. This generosity of database is typically vast then the data are amassed over time for the experimental results. So, the revelation of novel learning from the database has turned into a challenge in Bioinformatics. The important challenge is to establish the signalling pathways or networks to convey the information from recognized sources to target [4]. Initially the undirected PPI network is oriented and then it identifies the directed pathways from the source to target. It is a complicated problem where several paths are linked with two proteins in interaction networks [5]. The reconstruction of the biological pathway has involved a lot of attention, such as reconstruction of regulatory networks [6–10], discovery of signalling networks and pathways [11–13] and the analysis of metabolic networks [14–16]. In general, the proteins are interacting with each other and form protein complex networks. These proteins are important for the biological process in cell structure. The protein interactive network has signalling pathways that derive up with a protein source over variation to convey biological data to an exact target protein. To extort the signalling pathways of k-length from a source to a specific target and compute the consistency of each PPI interaction, the high reliability of the interactions leads to improved pathways in the overall network.

Formulating the edge orientation

A weighted undirected network G= (N, E), where N denotes the set of node or vertex (protein) of the graph and E represent set on the edge of the graph that describes the interaction between the proteins [18]. With a pair of u, n \(\in\) N and edge e = (u, n) \(\in\) E if and only if u, n can interrelate with each other. It defines S \(\subseteq\) N as the set of basis vertices of paths and T \(\subseteq\) N as the set of target vertices of pathways. All vertexes and edges have weight in the graph which is indicating w (n) and w (e). The protein weight is derived from the total number of individual protein interaction with a particular protein from the total proteins. The edge weight is calculated from the probability of interaction protein. A path has maximum distance k and source-target pairs of the form < s_i, t_i>, in which s_i\(\in\)S ⊆ N and ti\(\in\)T ⊆ N. Each path gets the form p = (n₁, n2), (n₂, n3),..., (n₁, n_l+1) where n₁ = s_i, n_l+1=t_i and l ≤ k for some pairs < s_i,t_i>. The value of the weight is usual for reliability in the occurrence of an edge or the participation of a protein in the path, and the weight of the path is calculated by the Eq. (1).

w(p) = \(\prod _{v\in p}w\left(n\right)\) * \({\prod }_{e\in p}w\left(e\right)\) (1)

The aim is orient the edge e = (u,n) \(\in\) E after u to n or n to u. A path is fulfilled by the network or graph means if and only if each edge (nj, nj + 1) have its orientation from nj to nj + 1 in the interaction network. The main goal of this problem is to exploit the entire weight of the satisfied paths or diminish the total weight of unsatisfied paths or to optimize the fitness function which is given in Eq. (2).

Fitness = \(\sum _{p\in P}Is\left(p\right)*w\left(p\right)\) (2)

Where P is set of paths with k length, \(w\left(p\right)\)is path weight, \(Is\left(p\right)\) is the function which has only two values such as 0 or 1. \(Is\left(p\right)=1\)denotes as the path is satisfied and 0 denotes as the path is not satisfied. The remaining sections of this manuscript are prepared as follows, Section 2 describes the background study, Section 3 provides the methodology of EGA algorithm, Section 4 discusses the experimental results and finally Section 5 gives the conclusion and future enhancement.

The Protein-Protein Interaction (PPI) data is always undirected [17] and the signalling networks are directed, but the existing PPI data are constantly undirected. To construct the networks, it involves not only the correct set of proteins and interactions, but also the direction for every edge of protein. Pathways in signalling database Kyoto Encyclopedia of Genes and Genomes (KEGG) [19] on average has only five edges among a target and its nearby source. The large amount of PPI data is a challenging problem in specific to determine biological pathways from the data. Recent proteomics studies have inspected integrations between cellular proteins and molecules [20]. Technologies for evaluating Protein-Protein Interactions do not provide information to the direction of edges. It is a great challenge to orient a given network [21]. The random orientation with local search algorithm is used for identifying paths in the Protein-Protein Interaction networks. A weighted MIN-SAT [22] is an optimized form of the satisfiability problem in which weighted disjunctive clauses with tops k literals are specified and the aim is to find the assignment to all variables that reduces the sum of the weights of the satisfied clauses. A Linear-time algorithm is used to identify paths in a graph or network under numerous biologically-motivated constraints [23]. Maximum Tree Orientation (MTO) problem satisfies a maximum number of pairs in an undirected tree and an ordered pair of vertices. It is considered as a set of source-target with single path of random length connected together. MTO finds high-confidence edges, short path and removal of redundant pathways [24]. Interestingly, PPI information is pretty often undirected; in this way the problem of orienting interaction edges of signal transmission in signalling network is exclusive. It leads to the charm of discovering an effective algorithm for edge-orientation in PPI networks [13]. Bacterial Foraging Optimization-Genetic Algorithm (BFO-GA) is used to solve the problem of multiple sequence alignment [30]. Genetic Algorithm is one of the widespread computational models for dealing with NP-hard problems. In this research work, the EGA technique improves the computational speed as well as reducing unnecessary edges in PPI networks. Several existing algorithms such as GA [3], ROLS [12], MIN-SAT [22] and MAX-CSP [25] were compared with the proposed algorithm. The experimental results exposed that the proposed algorithm provides a respectable solution for discovering pathways in PPI networks, and this finding was supported by comparing the results with other algorithms.

In this research work, the EGA method is used for discovering the pathways in PPI networks. The set of unaligned protein interactions is given as an input to the proposed and the existing algorithms. The input protein interactions are of the same length. The pathway analysis is achieved using the proposed and the existing algorithms such as GA, ROLS, MAX-CSP, MIN-SAT and EGA. The proposed algorithm is very efficient to provide a better solution and discover the pathways in PPI when compared to other algorithms. Figure 1. represents the overall framework for the proposed EGA. The Fig. 1(a) shows the key steps tangled in the proposed EGA algorithm. The pseudocode of the proposed EGA is shown in Fig. 2

Weight Calculation

The following parameters are used to calculate the weight of a protein interaction (Fig. 1(c)).

Protein Weight

The protein weight is derived from the total number of individual protein interaction with a particular protein from the total number of proteins in a network is given in Eq. (3).

Protein Weight= \(\frac{\text{T}\text{o}\text{t}\text{a}\text{l} \text{N}\text{u}\text{m}\text{b}\text{e}\text{r} \text{o}\text{f} \text{i}\text{n}\text{t}\text{e}\text{r}\text{a}\text{c}\text{t}\text{e}\text{d} \text{p}\text{r}\text{o}\text{t}\text{e}\text{i}\text{n}\text{s}}{\text{T}\text{o}\text{t}\text{a}\text{l} \text{n}\text{u}\text{m}\text{b}\text{e}\text{r}\text{s} \text{o}\text{f} \text{p}\text{r}\text{o}\text{t}\text{e}\text{i}\text{n}\text{s}}\) (3)

Edge Weight

The edge weight of an edge (Prot1, Prot2) is calculated using the probability of interacting protein pairs Prot1 and Prot2 which is shown in Eq. (4),

P (interact (Prot1, Prot2)) = 1- \(\prod _{\text{i}\in \text{P}\text{r}\text{o}\text{t}1,\text{P}\text{r}\text{o}\text{t}2}(1-\text{x}\left(\text{i}\right)\)) (4)

Where, i - member of set I_{Prot1, Prot2}; x(i)- reliability of experimental type i (reliability value 0.6)

Seed Generation

It is used to generate random paths in Protein-Protein Interaction network (Fig. 1(d)). After generating, the path weight is calculated using Eq. (5)

w(P) = \(\prod _{\text{v}\in \text{P}}\text{w}\left(\text{n}\right)\) * \(\prod _{\text{e}\in \text{P}}\text{w}\left(\text{e}\right)\) (5)

Where,

w(n) – weight of the node or vertex, w(e) – weight of the edge

Selection Phase

In selection phase (Fig. 1(e)), the individuals are sorted in the mating pool based on their fitness and randomly choose all the two best individuals in the current population. The fitness value is calculated according to the path weight.

Crossover Phase

A single point crossover is used to create a new child from the parent (Fig. 1(f)). The hamming distance is used to calculate the minimum distance. In a Single point crossover both parents' organism strings are chosen in which each data beyond single point is considered as organism string and swapped among the two parent organisms.

Mutation Phase

With the better path, the mutation operation is accomplished to produce new children, which perform modifications to deliver the possible variance for the offspring (Fig. 1(g)). The bits are randomly selected and modify its state to the opposite state.

Enhanced Genetic Algorithm

The proposed work describes an EGA for discovering the pathways in PPI networks. GA algorithm can perform well only when the population has enough diversity. If the population for an optimized problem converges too early, the results will be suboptimal as it cannot generate offspring’s that are superior to their parents and hence will be hard to find an optimal solution. So, in order to overwhelm such drawbacks a hamming distance is proposed in this work to calculate accurate distance between chromosomes in order to achieve population diversity and it is given in Eq. (6).

Hamming Distance = dij ₌\(\frac{h\left({p}_{i-pj}\right)}{l}\) (6)

l is the length of chromosome (path length), h is the hamming distance

4.1 Yeast’s Interaction Database

The latest BioGrid version 3.4.154 is an online database that consists of genetic interactions of organisms in huge scale [4]. It is also updated over recent times based upon the findings made by the experiments of the biologists. The same database with versions 2.0.50 is also used for comparing the results with the existing algorithms. Both versions are on the two-dimensional data table with 6,88,014 lines (3.4.154) and1, 40,849 lines (2.0.50) in which interactive information about a pair of proteins is included. It also includes the information about the types of experiments used for detecting the interactions and it can be used to find the weights for each interaction edge.

Gold standard pathways:

The PPI networks of yeast are compared to make orientation by the algorithm which helps to confirm the orientation procedure. It achieves good fitness function values and also produces biologically meaningful results. All yeast signalling pathways are obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The database that emphases on the signal path which is proved by the experiment is termed as a gold standard pathway. A group of gold standard pathways is called a gold standard network. The intersection of the individual pathways from the set of pathways is compared for evaluation of the algorithm.

4.2 Testing Process

The Depth First Search algorithm is used to find the set of paths from source to target and a set of conflicting edges are generated [3]. EGA is also used to find conflicting edges with the better orientation. The results obtained by EGA are compared with the results of the Random Orientation Algorithm plus Local Search (ROLS) and it is found that ROL’s performance is better than other algorithms like MIN-SAT, MAX-CSP and MIN-SAT.

4.3 Results and Discussion

The performance rate of EGA is analyzed from various aspects like runtime, fitness function and biological validation.

4.3.1 Algorithm's runtime

The runtime helps to calculate the total number of those operations that are executed. The running time of an algorithm is said to be longer if the number of operations is greater. To analyze the runtime of the orientation algorithm, the yeast interaction network is used. It also includes the numerous mixtures of maximum path length and source-target pairs of algorithms like EGA, GA, ROLS, MAX- CSP and MIN-SAT where the result of each algorithm is shown in the Table 1. In all the case where k = 4 up to 6, the number of pairs of source-target will increase which shows that the EGA works better than all the above stated algorithms. Sixteen sources and sixteen targets derived from the gold standard signalling pathways were used.

Table 1

Comparison of Runtime of proposed algorithm with existing algorithm
Sources	Targets	k	GA	ROLS	MAX-CSP	MIN-SAT	EGA
4	4	4	0.604	0.594	0.758	0.876	0.581
8	8	4	0.982	1.021	1.331	1.541	0.912
16	16	4	3.432	3.129	4.935	5.344	2.453
4	4	5	5.021	5.323	5.458	6.986	4.254
8	8	5	23.765	21.599	25.965	29.108	17.755
16	16	5	211.067	185.211	254.458	332.560	167.844
4	4	6	289.239	281.994	314.806	399.309	219.003
8	8	6	1687.770	1945.749	2334.105	2761.500	1237.784
16	16	6	14,445	16,986.165	21,452.090	28.673.518	10,348.670

4.3.2 Performance Measures using fitness function

This fitness function is used to maximize the total weight of the satisfied paths which is shown in the Eq. (2). It shows that EGA works better than all other existing algorithms and it is shown in Table 2 and Fig. 3 represents the graph for the same.

Table 2

**Comparison of Fitness value of proposed algorithm with existing Algorithms**
BioGRIDDatabase/Algorithm	EGA	GA	ROLS	MAX-CSP	MIN-SAT
BioGRID3.4.154	24,565	22,453	22,324	20,098	18,564
BioGRID 2.0.50	7134	6760	6455	5678	4702

4.3.3 Evaluation of the algorithms using gold standard pathways

The number of standard pathways is employed as a criterion for biological validation for assessment of the ability of the algorithm for finding the pathways. All the paths found are ranked according to different metrics like path weight and edge weight degree.

Table 3

Number of top-ranked predicted paths that correspond to known signalling pathways - BioGRID version 3.4.154
Metrics/Algorithms	EGA	GA	ROLS	MAX-CSP	MIN-SAT
Path weight	68	56	52	41	32
Max edge weight	25	20	18	13	8
Avg edge weight	39	32	29	21	13
Min edge weight	31	28	24	27	14
Edge degree	18	13	16	9	6

It also calculates top 100 paths based upon the weight of the path. If there are more than 6 edges of satisfactory paths that are consecutively matched in the gold standard pathways it is said to be completely matched. If there are four to six edges consecutively found in both standard and satisfactory paths it partially matches [26]. The pathways that do not consist of any known pathways in the gold standard network, those are the new edges. The results are shown in Tables 3 and 4 for the proposed and the existing algorithms such as EGA, GA ROLS MAX-CSP and MIN-SAT and different performance metrics such as path weight, edge weight (Max, Avg and Min) and edge degree (sum of in and out degree) for different BioGRID Database versions. The Figs. 4 and 5 shows the graph for the evaluation of the algorithm using gold standard pathways.

Table 4

Number of top-ranked predicted paths that correspond to known signalling pathways - BioGRID version 2.0.50
Metrics/Algorithm	EGA	GA	ROLS	MAX-CSP	MIN-SAT
Path weight	37	34	31	21	13
Max edge weight	11	9	7	5	2
Avg edge weight	29	26	29	20	11
Min edge weight	25	20	23	17	8
Edge degree	12	8	9	5	3

4.3.4 Accuracy

Accuracy of the algorithm can be predicted by calculating the number of correct predictions from the total predictions and it is calculated using Eq. (7). The comparison of accuracy is shown in the Table 5 and it is inferred that the EGA works better and gives higher accuracy than the other algorithms. Figure 6 represents the comparison of the accuracy of the proposed and the existing algorithms.

Accuracy = \(\frac{{N}{o}.{o}{f} {c}{o}{r}{r}{e}{c}{t} {p}{r}{e}{d}{i}{c}{t}{i}{o}{n}{s}}{{T}{o}{t}{a}{l} {p}{r}{e}{d}{i}{c}{t}{i}{o}{n}{s}}\) (7)

Table 5

Comparison of Accuracy for the proposed algorithm with existing algorithms
BioGRID Database/Algorithms	EGA	GA	ROLS	MAX-CSP	MIN-SAT
BioGRID3.4.154	91	86	85	79	71
BioGRID2.0.50	88	83	80	74	69

4.3.5 Statistical Measurements

The statistical comparison of Path weight and Accuracy of the proposed algorithm is shown in Tables 6 and 7 and the higher performance is achieved by the proposed algorithm. The Figs. 7 & 8 shows the statistical comparison of path weight and accuracy of the proposed algorithm with the existing algorithms.

Table 6

Statistical Comparison of Path Weight of proposed algorithm with existing algorithm
Comparison analysis	BioGRID Database	% difference of performance measures of EGA with
		GA		ROLS	MAX-CSP	MIN-SAT
Path Weight	3.4.154	17	23		39	52
	2.0.50	8	16		43	64

Table 7

Statistical comparison of Accuracy of proposed algorithm with existing algorithm
Comparison analysis	BioGRID Database	% difference of performance measures of EGA with
		GA		ROLS	MAX-CSP	MIN-SAT
Algorithm Accuracy	3.4.154	5	6		13	21
	2.0.50	6	9		15	21

4.4 Discussion

From this analysis, it is proved that EGA method has found many good pathways than any other existing algorithms. Here, gold standard pathways represent the pathways in the KEGG database. The same analysis is used to rank the paths for the derived pathways by dividing the paths into three groups where the first group holds the pathways with five or six proteins in which the edges are consecutively matched with the gold standard pathways. The second group consists of the pathways with six proteins in which the edges are matched with the gold standard pathways. These coinciding pathways can be used to derive new pathways which were not recorded previously in the database. The third group consists of the pathways that are not available in the gold standard network and these pathways are said to be the significant pathways in biology.

The Fig. 9(A) shows High-Osmolarity Glycerol (HOG) pathway that contains YIL147C (SLN1) → YDL235C (YPD1) → YLR006C (SSK1) → YCR073C (SSK22) → YJL128C (PBS2) which is also found in [10]. The pheromone pathway is found and it is filled with the flow YDR379W (RGA2) → YLR229C (CDC42) → YHL007C (STE20) → YLR36W (STE11) → YDL149W (STE7) → YBL016W (FUS3) → YHR0843 (STE 12) →YCL027W (FUS1)

The Fig. 9(B) shows the various new orientation edges. The paths in the pheromone signalling pathway contain the path to Ste11→ Ste5 edge. This edge causes an error as it was oriented reversely in the gold standard. Protein Ste5 interacts with Fus3, Ste7, and Ste11 and perform scaffolding protein functions in order to form the active complex [27–29]. From the Fig. 9(C), it is clear that the edges found may be biological hypotheses as it does not overlap with any other pathway in the gold standard network database.

The pathway plays an important role in cell biology. In this research work, the discovery of signalling pathways from PPI (Protein-Protein Interaction) networks is made using the proposed Enhanced Genetic Algorithm. This algorithm is used to reconstruct the signalling pathways based upon Protein-Protein Interaction Networks in Yeast organism. It also helps to reconstruct many known signalling pathways that are compared with KEGG database which is said to be the evidence of the correctness of the algorithm. Thus, this algorithm increases the computing speed and also achieves the highest accuracy among all the datasets of the predicted pathways. In the proposed research work, the Hamming distance is used to calculate the distance between the chromosomes to overcome the problems in the existing work such as lack of population diversity and lack of premature convergence. In future, this work can be used for other type of pathways such as metabolic pathways and human disease pathways for example: cancer pathways. This work can also be extended to other organisms such as human, fly etc.,

Conflict of interest: There are no conflicts of interest.

Competing interests: The authors declare no competing financial interests.

Data Availability: The code and datasets analyzed during the current study are available from the authors on reasonable request.

Fraser, A. G. and Marcotte, E. M. "A probabilistic view of gene function". Nat Genetics. (2004) 36,6, 559–564.
Bardwell.L, “A walk-through of the yeast mating pheromone response pathway”, Peptides 25 (2004) 1465–1476.
Chen Y, Xu D. “Understanding protein dispensability through machine-learning analysis of high-throughput data.” Bioinformatics, (2005) ,21:575–581.
Segal E, Shapira M, Regev A, Peer D, Botstein D, Koller D, Friedman N, Module networks: identifying regulatorymodules and their condition-specific regulators from gene expression data, Nat. Genet. 34 (2003) 166–176.
Grzegorczyk M, Husmeier D, "Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes", Bioinformatics 27 (2011) 693–699.
Liu G.X, Feng W, Wang H, Liu L, Zhou C.G, "Reconstruction of gene regulatory networks based on two-stage Bayesian network structure learning algorithm", J. Bionic Eng. 6 (2009) 86–92.
Ravcheev D.A, Best A.A,.Sernova N.V, Kazanov M.D, Novichkov P.S, Rodionov D.A, "Genomic reconstruction of transcriptional regulatory networks in lactic acid bacteria", BMC Genomics 14 (2013) 14–94
Margolin A.A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R.D, Califano A, "Aracne: an algorithmfor the reconstruction of gene regulatory networks in a mammalian cellular context", BMC Bioinf. 7 (2006) S7.
Bebek,G. and Yang,J., “PathFinder: mining signal transduction pathway segments from protein-protein interaction networks”, BMC Bioinformatics, (2007),8, 335
Gitter.A, J. Klein-Seetharaman, A. Gupta, Z. Bar-Joseph, “Discovering pathways by orienting edges in protein interaction networks”, Nucleic Acids Res. 39 (2011)
Scott,J., Ideker,T., Karp,R.M. and Sharan,R. “Efficient algorithms for detecting signalling pathways in protein interaction networks”. J. Comput. Biol., (2006) 13, 133–144.
Kitagawa J, Iba H, "Identifying metabolic pathways and gene regulation networks with evolutionary algorithms", Evol. Comput. Bioinforma. (2003) 255–275.
Fischer E, Sauer U, "Large-scale in vivo flux analysis shows rigidity and suboptimal performance of bacillus subtilis metabolism", Nat. Genet. 37 (2005) 636–640.
Ruppin E, Papin J.A, Figueiredo L.F, Schuster S, "Metabolic reconstruction, constraint-based analysis and game theory to probe genome-scale metabolic networks", Curr. Opin. Biotechnol. 21 (2010) 502–510.
Steffen M, Petti A, Aach J, D’haeseleer P and Church G,” Automated modelling of signal transduction networks”. BMC Bioinformatics,(2012), pp 3, 34.
Kanehisa M. and Goto S. KEGG: kyotoencyclopedia of genes and genomes. Nucleic Acids Res., (2000), 28, 27–30
Nguyen H.A , Vu C.L , Tu M.P , LamBui T " Discovery of pathways in protein-protein interaction networks using a genetic algorithm ". Data & Knowledge Engineering 96–97 (2015) 19–31
Medvedovsky,A., Bafna,V., Zwick,U. and Sharan,R. “An algorithm for orienting graphs based on cause-effect pairs and its applications to orienting protein networks” In Proceedings of the 8th international workshop on Algorithms in Bioinformatics. Karlsruhe, Germany,(2008), pp. 222–232
Xiong W, Xie L, Zhou S, Guan J , "Active learning for protein function prediction in protein-protein interaction networks."Neurocomputing145 (2014) 44–52.
Kohli R., Krishnamurti R. and Mirchandani P. “The minimum satisfiability problem” SIAM J. Discret. Math., (1994). 7, 275–283
Shlomi T, Segal D, Ruppin E, Sharan R: “QPath: a method for querying pathways in a protein-protein interaction network”. BMC Bioinformatics (2006), 7:199
Bueno R, Traina A.J, Jr C.T, "Genetic algorithms for approximate similarity queries", Data Knowl. Eng. 62 (2007) 459–482.
Anh N.H, Long V.C, Phuong T.M, Lam B.T, "A genetic-based approach for discovering pathways in protein–protein interaction networks", Proceedings of SoCPaR2013, (2013).
Charikar M, Makarychev K, Makarychev Y," Near-optimal algorithms for maximum constraint satisfaction problems", ACM Trans. Alg. 5 (2009) 1–14.
Fu W, Sanders-Beer B, Katz K, Maglott D , Pruitt K, Ptak R, "Human immunodeficiency virus type 1, human protein interaction database at NCBI", Nucleic Acid Res. 37 (2009) 417–422.
Gitter A, Klein-Seetharaman J, Gupta A, Bar-Joseph Z, "Supporting information, discovering pathways by orienting edges in protein interaction networks",http://sb.cs.cmu.edu/OrientEdges/ (2012).
Inouye C, Dhillon N, Durfee T, Zambryski P, Thorner J, "Mutational analysis of ste5 in the yeast Saccharomyces cerevisiae: application of a differential interaction trap assay for examining protein–protein interactions", Genetics 147 (1997) 479–492
Bardwell, "A walk-through of the yeast mating pheromone response pathway", Peptides 25 (2004) 1465–1476.
Dowell S.J, Bishop A.L, Dyos S.L, Brown A.J, White way M.S, "Mapping of a yeast g protein beta gamma signalling interaction", Genetics 150 (1998) 1407–1417.
Manikandan, P., Ramyachitra, D. Bacterial Foraging Optimization –Genetic Algorithm for Multiple Sequence Alignment with Multi-Objectives. Sci Rep 7, 8833 (2017). https://doi.org/10.1038/s41598-017-09499-1

AuthorBiography.docx

Download PDF

Reviewers agreed at journal
02 Mar, 2023
Editor assigned by journal
09 Mar, 2022
First submitted to journal
08 Mar, 2022

You are reading this latest preprint version

An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Methodology

4. Experimental Results

4.1 Yeast’s Interaction Database

4.2 Testing Process

4.3 Results and Discussion

4.3.1 Algorithm's runtime

4.3.2 Performance Measures using fitness function

4.3.3 Evaluation of the algorithms using gold standard pathways

4.3.4 Accuracy

4.3.5 Statistical Measurements

4.4 Discussion

5. Conclusion And Future Enhancement

Declarations

References

Supplementary Files

Status:

Version 1

­­­An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Work

3. Methodology

4. Experimental Results

4.1 Yeast’s Interaction Database

4.2 Testing Process

4.3 Results and Discussion

4.3.1 Algorithm's runtime

4.3.2 Performance Measures using fitness function

4.3.3 Evaluation of the algorithms using gold standard pathways

4.3.4 Accuracy

4.3.5 Statistical Measurements

4.4 Discussion

5. Conclusion And Future Enhancement

Declarations

References

Supplementary Files

Status:

Version 1

An Enhanced Genetic Algorithm for Determining the Pathways in Protein-Protein Interaction Networks