3.1. Identificaiton of topological modules
Based on above two complexes-related PPIs, two corresponding PPI networks are constructed, which named as NetH (Network provided by Hu et al.) and NetR (Network provided by Rajagopala et al.), respectively. In this study, protein complexes and functional modules of E.coli will be analyzed based on the two PPI networks. 120 and 171 topological modules are predicted by ELPA in NetH and NetR, respectively. In complex biological networks, small and sparse modules have been proved to perform important biological functions, so the identification of small and sparse topological modules is equally important with the larger and dense ones. Besides large and dense modules, ELPA can uncover those small and sparse modules as well. Since there are many gold-standard protein complexes contain only two proteins, those topological modules which consist of only two proteins are retained. The size of predicted topological modules ranges from two to hundreds of proteins. Furthermore, we found that many topological modules detected by ELPA are overlapping with each other. This phenomenon is fits well with real protein complexes and functional modules, which means some proteins involved in multiple complexes or functional modules. The study of overlapping proteins across different complexes or functional modules is also an important research topic.
3.2. Identification fo protein complexes
The protein modules of NetH and NetR predicted by EPLA are matched with 297 gold standard benchmark complexes of EcoCyc dataset, respectively. In the NetH, 222 benchmark complexes (88.1%) matched 91 predicted protein modules (75.8%), which mean that each protein module matches one or more real complexes. Most of the benchmark complexes consist of no more than ten proteins, so the larger protein modules should contain multiple complexes. While in the NetR, 134 benchmark complexes (53.2%) matched 70 predicted protein modules (40.9%). Through comparison, it is found that the scale of PPI network has a great influence on matching quality. In order to evaluate the predicted complexes, the set of effective matching complexes were obtained by selecting whose matching score larger than a threshold. Then Precision, Recall and F-measure were used to evaluate the quality of each module based on these effective matching complexes, respectively. The results show that most protein complexes predicted by ELPA are matched well with corresponding real complexes in both NetH and NetR. For example, In the NetH, 50th protein module consists of 8 proteins, among them potF, potH and potI are three proteins in putrescine ABC transporter complex; potA, potB, potC and potD cover all the four proteins of putrescine/spermidine ABC transporter complex (showed in Fig. 2a). 54th protein module consists of 5 proteins, it fully covered by two complexes: ferrichrome transport system and ferric coprogen transport system. fhuA, fhuB, fhuC and fhuD are ferrichrome transport system proteins, while fhuB, fhuC, fhuD and fhuE are ferric coprogen transport system proteins. Figure 2b shows that fhuB, fhuC and fhuD are the common proteins of the two complexes. As Fig. 3 shows, in the NetR, among the 8 proteins of 76th protein module, ccmA, ccmC, ccmD and ccmE are Protoheme IX ABC transporter proteins, while ccmE, ccmF and ccmH are CcmEFGH holocytochrome synthetase proteins, and ccmE links the two complexes. The above analysis shows that ELPA can effectively predict protein complex implied in E. Coli PPI network.
3.3. Identification of functional module
To determine whether the predicted protein modules have biological significance, each topological module is analyzed with the gold-standard proteins functional annotations of EcoCyc dataset. The topological modules of NetH and NetR detected by EPLA are matching with benchmark functional annotations, respectively. If the majority of proteins (> 50%) of a predicted topological module covered by a single functional term, then it is defined as a significance functional module. In the NetH, most of the predicted modules (82.5%) are significance functional modules, among them about 30% match perfectly (fully covered by a single functional term). For example, as Fig. 4a shows, 24 in out of 25 proteins of 19th protein module are annotated by GO:0006810, Obviously, the functions of these proteins are similar; all the 7 proteins of 40th protein module are fully covered by GO:0005886, GO:0016020 and GO:0017004, respectively (showed in Fig. 4b). While in the NetR, 60.8% predicted modules are significance functional modules, and 24.7% of them are fully covered by a single functional term. For example, as Fig. 5 shows, maIY, maIT, fixB, ybdM, recA and aes are annotated by GO:0005515, aes, recA and yhfW are annotated by GO:0005737, and yhfW and pyrC are annotated by GO:0046872. Obviously, some proteins have more than one function. The above analysis shows that ELPA can effectively detect functional modules implied in E. Coli PPI network as well.
3.4. Comparative evaluations
Most of clustering methods of complex network are based on node clustering, among them MCL have been proven that superior to other methods in identifying the functional modules or protein complexes in most cases [21, 22]. ELPA is a novel edge clustering method, it considers both node and link attitude, and can reflect the network structure better [16, 23]. Next we will compare the clustering results of ELPA and MCL in the same PPI network, respectively. ELPA is a parameter free method, and MCL will take the default parameters.
In order to compare the performance of MCL and ELPA, three metrics: Precision, Recall and F-measure are used to evaluate the predicted quality of protein complexes. Figure 6a shows the comparisons of topological modules of NetH and corresponding protein complexes between the two methods. We observe that the accuracy of ELPA slightly superior to MCL. The value of precision, recall and F-measure of ELPA are 72.5%, 61.5% and 66.5%, while those of MCL are 55.1%, 65.5% and 59.9%, respectively. Figure 6b shows the comparisons of topological modules of NetH and corresponding functional modules between the two methods. Effective and average matching rate of significance functional modules is utilized to evaluate the predicted quality of functional modules. The effective matching rate and average matching rate of ELPA are 82.5% and 70.9%, while that of MCL are 85.9% and 70% respectively. Similar results are got in NetR, as Fig. 7a shows, The value of precision, recall and F-measure of ELPA are 35.7%, 28.6% and 31.8%, while those of MCL are 24.3%, 32.1% and 27.7%, respectively. As Fig. 7b shows, the effective matching rate and average matching rate of ELPA are 60.8% and 74.3%, while that of MCL are 68.2% and 74.4%, respectively. The above results show that ELPA is an effective method to predict protein complexes and functional modules of E. Coli.
3.5. Comparative analysis of protein complexes and functional modules
PPIs can be divided into Permanent interactions and transient interactions. Permanent interactions are strong and stable, which give rise to protein complexes while the transient interactions vary with cellular processes and form functional modules. Therefore, comparative analysis of protein complexes and corresponding functional modules is of great scientific significance. For example, as Fig. 8a shows, the 21th protein module of NetH mainly consists of three real complexes: NADH: ubiquinone oxidoreductase I (10 out of 11 real proteins of this complex match with this module), hydrogenase 4 (5 proteins of this complex fully covered by this module) and formate hydrogenlyase complex (3 out of 5 real proteins of this complex match with this module). Moreover, GO:0055114 fully covers the 18 proteins of this module. Among them, hycE, hycF and hycG are part of formate hydrogenlyase complex, and the function annotations of them are hydrogenase 3, formate hydrogenlyase complex iron-sulfur protein, and hydrogenase 3 and formate hydrogenlyase complex-HycG subunit, respectively. Which means the function of the three proteins agrees with formate hydrogenlyase complex, and it also hints that hydrogenase 3 complex is related with formate hydrogenlyase complex. Hydrogenase 4 consists of hyfB, hyfD, hyfF, hyfG and hyfI. The function annotations of hyfB, hyfD, hyfF, hyfG and hyfI are hydrogenase 4-component B, D, F, and large, small subunit respectively. This is highly consistent with the function of hydrogenase 4 complex. The remaining ten proteins: nuoB, nuoC, nuoE, nuoF, nuoG, nuoH, nuoI, nuoL, nuoM and nuoN are all related with the function of NADH: ubiquinone oxidoreductase complex. Figure 8b shows the similar results in NetR, such as in the 52th protein module, malG, malE, malF and malK are part of maltose ABC transporter complex (4 out of 5 real proteins of this complex match with this module), and all of them matching with GO:0015768, GO:0042956 and GO:0043190. The function annotations of malE, malG/malF, and malK are maltose ABC transporter-periplasmic binding protein, maltose ABC transporter-membrane subunit, and maltose ABC transporter-ATP binding subunit respectively. Above results show that protein complexes should be highly related with corresponding functional modules.