We implemented the FastKnock algorithm using Python language programming (Version 2.7) and the COBRApy library (Version 0.15.4) [63]. Our source code is publicly available at https://github.com/leilahsn/FastKnock and also as supplementary material. We evaluated the performance of FastKnock using various examples, and we compared these results to an alternative approach.
3.1 FastKnock results for E. coli models
To evaluate FastKnock’s performance, we selected three highly curated GEMs for E. coli (i.e., iJR904 [17], iAF1260 [18], and iJO1366 [19]) for our experiments to overproduce some well-known metabolites, including succinate, lactate, 2-oxoglutarate, and lycopene as the both primary and secondary biological products.
We tested the production of primary metabolites focusing on two cultivation conditions: The first condition is CM1: iM9 medium supplemented with glucose (a maximum allowable glucose uptake rate of 10 mmol.gDW-1h-1) under aerobic conditions (a maximum allowable oxygen uptake rate of 15 mmol.gDW-1h-1). The second condition is CM2: iM9 medium supplemented with glucose (a maximum allowable glucose uptake rate of 10 mmol.gDW-1h-1) under anaerobic conditions (an oxygen uptake rate of 0 mmol.gDW-1.h-1).
Many of the models’ reactions are not active in the minimal iM9 medium. In a complex and rich environment, due to the activation of more inputs to the cell, more pathways and consequently more reactions are active in the network. Hence, in order to further evaluate the exhaustive enumeration performance of the FastKnock algorithm in a rich cultivation condition, we conducted additional in silico experiments considering Luri-Bertani (LB) medium. The iLB medium constraints were intended based on [64], [65]. We deployed the two highly- curated E. coli GEMs (i.e., iJR904 and iML1515 [20]) for the experiments. The input settings (i.e., exchange fluxes) to define the mediums for the models used in the current study are listed in the exchanges.xls file.
The secondary metabolite, lycopene, is produced in the cell only under aerobic conditions. We considered two strains for lycopene production. For the first strain (Strain1), the lycopene biosynthesis pathway (i.e., the methylerythritol phosphate (MEP) pathway [66]) is added to the wild-type E. coli model [39][67][68]. For the second strain (Strain2), some other modifications are applied based on [69]. This provides an intracellular pool of pyruvate as the important precursor of lycopene production [70]. Tables 1 and 2 in Supplement J.I show the maximum theoretical yield for the biosynthesis of the metabolites (i.e, maximum of Vchemical) and our threshold for their production (Thc hemical = 0.05 Vchemical).
The result of the preprocessing phase is shown in Table 2 of Supplement J.I, which demonstrates the number of reactions that are excluded from the search space before the main exploration procedure is applied and before the removable reactions are obtained. The size of the search space is drastically reduced to 20% of all the reactions. In the Reduced_model, the blocked reactions and dead ends are removed [62]. Also, as described in Section 2, after the preprocessing phase, the search space is reduced iteratively and temporally during the search procedure of the FastKnock algorithm. This significantly reduces the number of linear programming problems (LPs) that must be solved. Specifically, compared to an exhaustive search, the reduction rates are 78%–85% for single knockouts, 95%–97% for double knockouts, 99.0–99.5% for triple knockouts, and above 99.8% for quadruple and quintuple knockouts (Table 1). The number of LPs is equal to the number of nodes in the traversal tree shown in Figure 1, and it is independent of the target metabolite to be produced.
In comparison, in the exhaustive search the algorithm must check all the combinations of the reactions in the search space. For instance, iJR904 in CM2 has 208 reactions in its search space. For finding double-knockout results in the exhaustive search, the algorithm must check all the double combinations of the elements in the search space (c(208, 2) = 21,528). Due to its time complexity, the exhaustive approach is not feasible for high-order reaction knockouts; thus, we compared FastKnock to a simple exhaustive search method for single, double, or triple knockouts. Our experiments showed that a significant reduction in the number of LPs is critical because it allows us to investigate and find all possible solutions.
Table 2 presents the total number of solutions obtained using the FastKnock algorithm. The results are reported in two cases: the maximum production rate (Ratemax) and the guaranteed production rate (Rategrnt) as discussed in Supplement I.
We also compared our solutions to the results of the exhaustive search for single, double, and triple deletions for succinate production in iJR904 to verify the completeness of the FastKnock algorithm. Both approaches found two solutions for a single deletion. The exhaustive search for a double deletion found 398 solutions, of which only 58 solutions were true double deletions. The rest of the solutions were not acceptable because either (a) the combination of each single deletion solution and a zero-flux reaction was inappropriately considered as a double-deletion solution or (b) the elimination of a reaction in the co-knocked-out sets led to the removal of all the reactions in the set, while in the exhaustive search, the removal of each reaction in the set is counted as a separate solution. For triple deletions, the exhaustive search found 39,407 solutions, of which 887 were unique and acceptable. FastKnock found all the 887 solutions.
Table 3 presents the best solutions in iJR904 GEM as Rategrnt mode. Supplement J.II shows the results for the iAF1260 and iJO1366 GEMs as well as the maximized solutions. As an example, we found that the best result for succinate overproduction is obtained by deleting one reaction, ADHEr, which is knocked out by the deletion of the gene b1241. Consequently, the deletion of the b1241 gene also causes the deletion of the LCADi_copy2 reaction. In this situation, the growth rate is 0.16 (1/h) as shown in the biomass formation rate column. After the deletion of ADHEr, the succinate production can vary between 5.11 and 9.50 mmol.gDW-1h-1, which is more than the 0.85 mmol.gDW-1h-1 threshold; hence, an acceptable amount of succinate production is guaranteed. Moreover, Table 3 presents the production envelopes calculated for succinate production (Figure 2).
For practical applications, various evaluation indices, including product yield, SSP, and SoGC [55], and other important indices reflecting environmental and operational considerations, can be used to choose the most appropriate cases from the solutions found by FastKnock (Table 4 and Table 5). In particular, the feasibility of CO2 biofixation and minimal production of undesired or toxic byproducts are also significant indexes for systems metabolic engineering purposes. For instance, an engineered strain that can simultaneously fix CO2 and produce a suitable biochemical might be preferred regarding environmental considerations. When all solutions are available, the analysis and identification of such appropriate cases is easily possible.
We analyzed FastKnock solutions in order to find the most appropriate solutions based on three criteria, yield, SSP, and SoGC (Table 5). Additionally, the feasibility of CO2 biofixation is also examined and the relevant results are summarized, where a negative CO2 exchange flux represents a desirable CO2 uptake rate. We compared these best solutions obtained by FastKnock with the associated OptKnock results as well as experimental data available in the literature [71][72][73]. Note that Optknock aims at, and terminates on, finding a single solution. Therefore, comparing it vs. FastKnock in terms of computational costs is not meaningful.
We found that a solution with the best production rate or an optimal solution of the optimization algorithms such as OptKnock does not necessarily bring the best SoGC and the other desired indexes. However, by identifying all the possible solutions for the problem, FastKnock allows a comprehensive analysis. For example, knocking out ADHEr, ATPS4r, and LDH_D is expected to lead to the best biomass formation rate (0.16 h-1) and the highest SoGC (3.01 h-1), which is twice the best SoGC provided by OptKnock solutions while the other indices corresponding to this knockout are comparable with the best numbers shown in the table (i.e., a production rate of 8.90 vs. 12.24 mmol.gDW-1.h-1, a yield of 0.89 vs. 1.22, an SSP 1.42 vs. 1.46 h-1, and a CO2 uptake rate of -8.76 vs. -9.36 mmol.gDW-1.h-1). A relatively high value of SoGC can also be desirable from a dynamic perspective because it indicates that even under non-optimal conditions, the biosynthesis of the target biochemical is coupled with the growth of the production strain. This situation is usually encountered in batch and fed-batch cultivations in the logarithmic phase of growth.
A more striking example is the comparison between the PTAr, PYK, ATPS4r, and SUCD1i quadruple knockout identified by OptKnock with the two solutions with the best production rate (ADHEr, LDH_D, PFL, and THD2) and the best SoGC (ADHEr, LDH_D, HEX1, and THD2) identified by FastKnock. While the biomass formation rate of the FastKnock solutions (0.11, 0.13 h-1, respectively) are comparable with the OptKnock solution (0.16 h-1), the yield and SSP is an order of magnitude higher for FastKnock solutions. A serious issue with this OptKnock solution is the very low SoGC (0.01 h-1), which indicates that the production rate would be hardly coupled with growth. In comparison, the predicted SoGC for FastKnock solutions are 2.85 and 3.09 h-1, respectively. Another disadvantage of OptKnock solution is a relatively high CO2 production rate of 9.03 mmol.gDW-1.h-1 while in the FastKnock solutions the CO2 uptake rates are -6.12 and -8.77 mmol.gDW-1.h-1, respectively.
Among the quintuple knockouts, the predicted SSP and SoGC for one of the FastKnock solutions (ADHEr, LDH_D, GLUDy, PFL, and THD2) are almost twice those of the OptKnock solution (ADHEr, LDH_D, PTAr, PYK, and GLCpts) while the other indices are comparable.
3.2 Comparing FastKnock to MCSEnumerator (case study: ethanol overproduction in E. coli AF1260)
As mentioned previously, MCSEnumerator is a novel method for metabolic engineering based on the identification of minimal cut sets [50]. This approach applies a filtering step to reduce the computation time, which allows the user to find thousands (but not all) of the most efficient knockout strategies in genome-scale metabolic models. MCSEnumerator can be used to find a large number of metabolic engineering interventions, but it has various drawbacks. In this section, we compare MCSEnumerator with FastKnock. To aid in this comparison, we consider the case study of ethanol production in E. coli iAF1260 GEM with an 18.5 mmol.gDW-1h-1 glucose uptake rate under anaerobic conditions (iM9 medium) as presented in the MCSEnumerator publication.
We should discuss the effect of the MCSEnumerator thresholds on its solution set. It would not be feasible to apply MCSEnumerator using thresholds that are relaxed enough to find all the solutions (Supplement H). We illustrate this with an example in Figure 3. The blue production envelope, which has the best SoGC value, is associated with a solution found by both MCSEnumerator and FastKnock. The associated solutions (with the red and green diagrams), which are the worst cases among the shown envelopes, were not found by MCSEnumerator because of the production threshold considered. This illustrates the efficiency of the primary filtration of the MCSEnumerator method. The starting point might not be the best factor for filtering appropriate solutions. For example, the minimum production rate based on the orange envelope is similar to the green envelope in Region Y3, which is below the threshold considered for ethanol production flux. Nevertheless, the orange envelope may still be associated with a proper solution due to its relatively high SoGC, but it was not found by MCSEnumerator.
Furthermore, the predetermined thresholds may lead to the fact that some of the solutions obtained by MCSEnumerator are not necessarily and truly minimal. It means that an appropriate solution of cardinality (n) may exist and not be found while it appears in some higher-order solutions (>n) which contain irrelevant additional reactions.
While the MCSEnumerator algorithm and its modified editions may have shorter execution times, the number of solutions they can provide with certain settings is only a very small percentage of the total potential solutions. Therefore, comparing the MCSEnumerator and FastKnock algorithms based solely on execution time is not rational since these algorithms neither produce the same output nor have the same objective.