FastKnock: An efficient next-generation approach to identify all knockout strategies for strain optimization

Overproduction of desired native or nonnative biochemical(s) in (micro)organisms can be achieved through metabolic engineering. Appropriate rewiring of cell metabolism is performed making rational changes such as insertion, up-/down-regulation and knockout of genes and consequently metabolic reactions. Finding appropriate targets (including proper sets of reactions to be knocked out) for metabolic engineering to design optimal production strains has been the goal of a number of computational algorithms. We developed FastKnock, an efficient next-generation algorithm for identifying all possible knockout strategies for the growth-coupled overproduction of biochemical(s) of interest. We achieve this by developing a special depth-first traversal algorithm that allows us to prune the search space significantly. This leads to a drastic reduction in execution time. We evaluate the performance of the FastKnock algorithm using three Escherichia coli genome-scale metabolic models in different conditions (minimal and rich mediums) for the overproduction of a number of desired metabolites. FastKnock efficiently prunes the search space to less than 0.2% for quadruple and 0.02% for quintuple-reaction knockouts. Compared to the classic approaches such as OptKnock and the state-of-the-art techniques such as MCSEnumerator methods, FastKnock found many more useful and important practical solutions. The availability of all the solutions provides the opportunity to further characterize and select the most appropriate intervention strategy based on any desired evaluation index. Our implementation of the FastKnock method in Python is publicly available at https://github.com/leilahsn/FastKnock.

In a nutshell, primitive top-down approaches use optimization methods to nd an optimal solution at the cost of signi cant execution time. While top-down metaheuristic approaches require less computational resources, they are not guaranteed to nd a globally optimal solution because the search space contains many local optima. On the other hand, bottom-up approaches can be used to nd a set of potential solution candidates [14]. Despite various integrated computational and experimental studies, it is challenging to identify the most proper and operative alterations by only comparing the ux distributions of the wild-type to the ideally engineered states. Considering high order cardinalities and interventions [47] adds to the complexity of the problem.
State-of-the-art approaches have been developed to dramatically alleviate the computational challenges and signi cantly reduce the computational costs including (iteratively) pruning the search space [48] [49] and sequentially enumerating the smallest minimal cut sets (MCSs) in order to provide several solutions [50]. For example, Fast-SL properly explores a metabolic network of interest to nd the most appropriate synthetic lethal reaction sets. Fast-SL improves the performance of a brute-force search algorithm by iteratively reducing the size of the search space, which substantially shortens the execution time [49]. MCSEnumerator is another novel method that attempts to nd many solutions using MCSs aimed at the identi cation of either synthetic lethal sets or optimal strain design targets [50].
Calculating the MCSs in GEMs is a complex and challenging computational problem [51]. The scalability of MCSEnumerator algorithms paves the way for both theoretical and practical studies considering high order simultaneous reaction interventions for strong growth-coupled product formation [52] [53].
However, for in silico strain design, the MCSEnumerator approach require prede ning of the acceptable thresholds for growth and target product yields and this contributes to different drawbacks such as neglection of some appropriate suboptimal solutions [54].
In this paper, we present FastKnock as a next-generation knockout strategy algorithm that provides the user with all possible solutions for multiple gene and reaction knockouts to overproduce a (bio)chemical of interest. Unlike the MCSEnumerator approach, FastKnock does not rely on any special parameter settings and additional assumptions (except for prede ning the maximum number of simultaneous reaction knockouts). We developed a delicate search and prune algorithm to accomplish this goal at a greatly reduced computational time and cost. Our method combines (and bene ts from) both basic approaches to tackle the problems described above. It incorporates reaction knockouts to couple the biosynthesis of both primary (e.g., succinate, lactate, ethanol, etc.) and secondary metabolites with cell reproduction. The secondary metabolites include native, e.g., dodecanoic acid, and heterologous biochemicals (e.g., polyketides such as erythromycin and terpenoids such as lycopene). It examines the GEM at the level of metabolic reactions while checking the corresponding genes to consider the gene dependency of the reactions.
The availability of all solutions allows us to systematically characterize and rank these strategies in accordance with some criteria including (a) substratespeci c productivity (SSP) [14][15] [55][56], (b) the strength of growth coupling (SoGC), de ned as the square of the product yield per unit substrate divided by the slope of the lower edge of the production curve [14][15] [55][56], (c) strain dynamic performance, which depends on yield, productivity, and titer [57][58], and (d) other important indices re ecting environmental and operational considerations such as the feasibility of CO 2 bio xation and minimal production of undesired or toxic byproducts. Some alternative criteria are discussed in [59]. Furthermore, it would be possible to evaluate the solutions and categorize them in the different major classes: potentially, weakly, directionally growth-coupled production (pGCP, wGCP, dGCP) and substrate-uptake coupled production This article is organized as follows: Section 2 introduces the FastKnock algorithm, which we designed to effectively search the metabolic network to nd all reaction knockout strategies that result in the overproduction of the desired biochemical(s). Section 3 presents the results of in silico experiments employing highly curated GEMs of E. coli. Last, Section 4 presents our conclusions.

The proposed method
We developed the FastKnock algorithm, which is a general framework that can be used to increase the production rate of the desired metabolite in a cell simultaneously with growth. The desired metabolite can be of a primary or secondary type and can be native or heterologous. Speci cally, the algorithm can be applied on heterologous metabolites through the inclusion of the associated pathways into the GEM set.
In other words, FastKnock identi es reactions to be deleted from the network while ensuring that the ux of biomass formation reaction remains above a speci c cut-off (i.e., 1% of gr WT , Supplement D) and the production of the desired substance(s) increases as much as possible [61]. For practical applications, FastKnock can be used to nd the subsets of network reactions that can be removed in order to signi cantly increase the production of the desired biochemical. Speci cally, FastKnock identi es the strains in which the production rate of the desired biochemical is more than a prede ned threshold in the base model (i.e., the model without any interventions). We call this threshold Th chemical , which we de ne as 5% of the maximum theoretical yield (i.e., the optimal production rate of the desired biochemical when it is considered the objective of the cell) in the base model. FastKnock, like other common approaches, uses preprocessing to reduce the size of the metabolic model reactions and the search space. In the preprocessing phase (Supplement C), the set of the removable reactions (denoted by Removable) is identi ed and structurally excluded from the metabolic network to produce a reduced model denoted as Reduced_model. The set of reactions of the Reduced_model is called RXNS.
The search space of the exhaustive search includes all sets of reactions of the Reduced_model with a particular size. This search space grows exponentially as the size of the set increases. Therefore, examining all sets using an exhaustive search is very time-consuming and would be infeasible. To tackle this problem, our proposed algorithm uses the information that is available only during the search procedure to dynamically narrow the search space (i.e., the search space is iteratively pruned and some reactions are temporarily excluded). This reduced search space is used to nd the knockout strategies; therefore, we call it the target space.
For practical applications, one important feature of FastKnock is that it can optionally consider genes as the basis of reaction deletions. This is a realistic assumption because knocking out genes to remove a speci c reaction often leads to removing a predetermined set of reactions that are simultaneously knocked out. In this work, we label a set of reactions as co-knocked out if they are removed due to the elimination of a single gene. Supplement E explains a modi cation of the algorithm based on knocking out genes rather than reactions.

FastKnock algorithm
Our proposed method aims to nd all solutions to a strain optimization problem to achieve the growth-coupled overproduction of a metabolite (i.e., biochemical) of interest. Each solution is a set of k reactions (i.e., a knockout strategy) such that the elimination of these reactions creates a new engineered strain in which the overproduction of the biochemical of interest is coupled with cell growth.
Testing whether a set of reactions is a proper solution is equivalent to solving an optimization problem in which the objective function is the growth of the cell and reactions elimination corresponds to adding constraints to the optimization problem. By solving this optimization problem, we obtain the ux of all the reactions including the production rate of a desired biochemical. An appropriate solution (i.e., a knockout strategy) should satisfy the objective function along with providing a suitable production rate for the desired biochemical product.
To nd all subsets of reactions of size ≤ k, we consider a tree-based representation of all the combinations of reactions with a maximum size of k, which is outlined below. All sets of k reactions are placed in nodes of the tree in depth k (i.e., at the level k). The root node at level zero corresponds to removing no reaction (i.e., wild-type microorganism). The FastKnock procedure starts with investigating the elimination of a single arbitrary reaction r1 at level one. Whether knocking out r1 is a solution or not, we continue investigating simultaneous elimination of r1 and another reaction at level two. At each level, we consider only the reactions that have non-zero ux according to the optimization problem solved in the parent node in the upper level. The procedure of adding reactions with non-zero ux to the set of knockout reactions continues at lower levels of the tree until one of the two stopping conditions is met: a) we reach a leaf at level k (the prede ned number of knockouts) or b) we reach a node that is guaranteed to have no solution in its subtree.
To check condition b in each node at level l<k, we determine whether the subtree may not include a solution by investigating the optimization problem. Speci cally, if the optimization problem already has an infeasible region at a node, adding more constraints in the subtree of the node would not lead to a proper solution (Supplement F).
The merit of the procedure is the technique of bounding the search by a) excluding the reactions with zero ux at each node and b) checking the feasibility of reaching a solution before expanding the subtree of each node. This way, we dynamically and effectively prune the search space. Figure 1 illustrates the overall procedure using a depth-rst traversal tree. The root node corresponds to the base model in which no reaction is deleted. Algorithm 1 represents the de nition of a node in the tree, as well as, the main procedure of the FastKnock algorithm. Each instance of the Node contains the model, the set of the removed reactions, the search space, and the target space for the next level ( Figure 1). Speci cally, at each node X of the tree at level L, we investigate a set of L reactions (deleted_rxns) to determine (a) whether X is a solution and (b) the new target space, which is the set of all reactions that could potentially be added to deleted_rxns for investigation at the next level.
Determining the target space at each node is critical, and it allows us to avoid the combinatorial explosion of the tree that would inevitably result from an exhaustive search. In particular, while we investigate drastically fewer subsets of reactions at the children nodes in Level L+1, our analysis guarantees that FastKnock will nd every candidate solution (Supplement F).
In Algorithm 1, the traversal of the tree shown in Figure 1 is represented by a set of queues: queue 1 to queue target_level . Each queue contains a set of nodes. At each moment during the execution of the algorithm, queue l contains all children of a certain node at level l-1 being investigated. In this way, the subtrees are gradually constructed and removed (pruned).
The main algorithm consists of three functions: identifyTargetSpace, constructSubTree, and traverseTree. For each node, we compute a target space and a ux distribution using the identifyTargetSpace function. This function temporarily narrows the search space for the whole subtree of the node. The subtree of a node is constructed using the constructSubTree function. The traverseTree function recursively navigates the tree, based on a depth-rst traversal.
We elaborate these functions in the following subsections. First, we determine the target space and then describe the search procedure (i.e., how the traversal tree is partially constructed and traversed). In our implementation, we improved the quality of the obtained solutions by guaranteeing the minimal chemical production rate (Supplement I), and increased the speed of the algorithm using parallel processing (Supplement G).

Identifying the target space
At steady state, a speci c ux range for each reaction r is obtained (minFlux r ≤ f r ≤ maxFlux r ), which leads to the optimal cellular objective (e.g., maximizing the biomass formation ux). Knocking out a reaction r is implemented by setting the allowable ux range [62] of the reaction to zero (i.e., lb r = ub r = 0 in the optimization problem of Equations a.1 and a.5 in Supplement A). Note that when a reaction is reversible (i.e., the obtained ux range of a reaction includes zero minFlux r ≤ 0 ≤ maxFlux r ), knocking out that reaction alone has no effect on the optimal objective value of the network (Supplement F).
Here, the main idea is to prune the target space by considering only the set of reactions with nonzero ux values. This approach signi cantly reduces the size of the target space and thus reduces the execution time of the algorithm.
The target space of each node, which is the set of reactions that could be appropriate for deletion, is obtained using the identifyTargetSpace function (Algorithm 2). The search operation at each node is limited to Rxns + ∩ Removable, as shown in Line 6 of Algorithm 2.
It is worth mentioning that by any manipulation of the model, the uxes of other reactions may change. Therefore, the functional states (i.e., ux distribution) should be analyzed repeatedly after each modi cation (i.e., after each reaction knock-out) using FBA to identify the reactions that carry non-zero ux in the network (model X ) (Lines 4-5). The ux_dist variable of the node is updated at Line 4. The intersection of these reactions and the Removable set construct the target space of node X in Line 6.

The search procedure
Here, we introduce a depth-rst search procedure based on the traversal tree of Fig 1. Each node of the tree has its own subtree, which is traversed before traversing its sibling nodes. This depth-rst search procedure is implemented using the traverseTree function of Algorithm 4.
In each call, the traverseTree function visits a certain node X (i.e., the rst node of the queue level ) and, if needed, calls the constructSubTree function to create the corresponding subtree of the node (Algorithm 3). The constructSubTree function creates the children nodes of X, which is a set of nodes that are placed n level = X.level + 1. For each child, deleted_rxns is initialized by adding one of the reactions in X.traget_space to the X.deleted_rxns.
It is clear that the order of the knocked-out reactions is not important. In FastKnock, repetitive permutations of the reactions are ignored using a checked level queue for each level of the tree. Generally, N levels are considered for simultaneously knocking out N reactions from the cell. Precisely, the reaction selected for the i th level is not allowed in the (i+1) th to N th levels. To generate all combinations of these reactions, the checked L queue is used at level L. At level L, by deleting a reaction r from the target space, r is added to the checked L . This excludes the reaction from the target space of the subsequent levels.

Results
We implemented the FastKnock algorithm using Python language programming (Version 2.7) and the COBRApy library (Version 0.15.4) [63]. Our source code is publicly available at https://github.com/leilahsn/FastKnock and also as supplementary material. We evaluated the performance of FastKnock using various examples, and we compared these results to an alternative approach.
We tested the production of primary metabolites focusing on two cultivation conditions: The rst condition is CM1: iM9 medium supplemented with glucose (a maximum allowable glucose uptake rate of 10 mmol.gDW -1 h -1 ) under aerobic conditions (a maximum allowable oxygen uptake rate of 15 mmol.gDW -1 h -1 ).
Many of the models' reactions are not active in the minimal iM9 medium. In a complex and rich environment, due to the activation of more inputs to the cell, more pathways and consequently more reactions are active in the network. Hence, in order to further evaluate the exhaustive enumeration performance of the FastKnock algorithm in a rich cultivation condition, we conducted additional in silico experiments considering Luri-Bertani (LB) medium. The iLB medium constraints were intended based on [64], [65]. We deployed the two highly-curated E. coli GEMs (i.e., iJR904 and iML1515 [20]) for the experiments. The input settings (i.e., exchange uxes) to de ne the mediums for the models used in the current study are listed in the exchanges.xls le.
The secondary metabolite, lycopene, is produced in the cell only under aerobic conditions. We considered two strains for lycopene production. For the rst strain (Strain1), the lycopene biosynthesis pathway (i.e., the methylerythritol phosphate (MEP) pathway [66]) is added to the wild-type E. coli model [39][67] [68]. For the second strain (Strain2), some other modi cations are applied based on [69]. This provides an intracellular pool of pyruvate as the important precursor of lycopene production [70]. Tables 1 and 2 in Supplement J.I show the maximum theoretical yield for the biosynthesis of the metabolites (i.e, maximum of V chemical ) and our threshold for their production (Th c hemical = 0.05 V chemical ).
The result of the preprocessing phase is shown in Table 2 of Supplement J.I, which demonstrates the number of reactions that are excluded from the search space before the main exploration procedure is applied and before the removable reactions are obtained. The size of the search space is drastically reduced to 20% of all the reactions. In the Reduced_model, the blocked reactions and dead ends are removed [62]. Also, as described in Section 2, after the preprocessing phase, the search space is reduced iteratively and temporally during the search procedure of the FastKnock algorithm. This signi cantly reduces the number of linear programming problems (LPs) that must be solved. Speci cally, compared to an exhaustive search, the reduction rates are 78%-85% for single knockouts, 95%-97% for double knockouts, 99.0-99.5% for triple knockouts, and above 99.8% for quadruple and quintuple knockouts ( Table 1). The number of LPs is equal to the number of nodes in the traversal tree shown in Figure 1, and it is independent of the target metabolite to be produced.
In comparison, in the exhaustive search the algorithm must check all the combinations of the reactions in the search space. For instance, iJR904 in CM2 has 208 reactions in its search space. For nding double-knockout results in the exhaustive search, the algorithm must check all the double combinations of the elements in the search space (c(208, 2) = 21,528). Due to its time complexity, the exhaustive approach is not feasible for high-order reaction knockouts; thus, we compared FastKnock to a simple exhaustive search method for single, double, or triple knockouts. Our experiments showed that a signi cant reduction in the number of LPs is critical because it allows us to investigate and nd all possible solutions. Table 2 presents the total number of solutions obtained using the FastKnock algorithm. The results are reported in two cases: the maximum production rate (Rate max ) and the guaranteed production rate (Rate grnt ) as discussed in Supplement I.
We also compared our solutions to the results of the exhaustive search for single, double, and triple deletions for succinate production in iJR904 to verify the completeness of the FastKnock algorithm. Both approaches found two solutions for a single deletion. The exhaustive search for a double deletion found 398 solutions, of which only 58 solutions were true double deletions. The rest of the solutions were not acceptable because either (a) the combination of each single deletion solution and a zero-ux reaction was inappropriately considered as a double-deletion solution or (b) the elimination of a reaction in the coknocked-out sets led to the removal of all the reactions in the set, while in the exhaustive search, the removal of each reaction in the set is counted as a separate solution. For triple deletions, the exhaustive search found 39,407 solutions, of which 887 were unique and acceptable. FastKnock found all the 887 solutions. Table 3 presents the best solutions in iJR904 GEM as Rate grnt mode. Supplement J.II shows the results for the iAF1260 and iJO1366 GEMs as well as the maximized solutions. As an example, we found that the best result for succinate overproduction is obtained by deleting one reaction, ADHEr, which is knocked out by the deletion of the gene b1241. Consequently, the deletion of the b1241 gene also causes the deletion of the LCADi_copy2 reaction. In this situation, the growth rate is 0.16 (1/h) as shown in the biomass formation rate column. After the deletion of ADHEr, the succinate production can vary between 5.11 and 9.50 mmol.gDW -1 h -1 , which is more than the 0.85 mmol.gDW -1 h -1 threshold; hence, an acceptable amount of succinate production is guaranteed. Moreover, Table 3 presents the production envelopes calculated for succinate production ( Figure 2).
For practical applications, various evaluation indices, including product yield, SSP, and SoGC [55], and other important indices re ecting environmental and operational considerations, can be used to choose the most appropriate cases from the solutions found by FastKnock (Table 4 and Table 5). In particular, the feasibility of CO 2 bio xation and minimal production of undesired or toxic byproducts are also signi cant indexes for systems metabolic engineering purposes. For instance, an engineered strain that can simultaneously x CO 2 and produce a suitable biochemical might be preferred regarding environmental considerations. When all solutions are available, the analysis and identi cation of such appropriate cases is easily possible.
We analyzed FastKnock solutions in order to nd the most appropriate solutions based on three criteria, yield, SSP, and SoGC (Table 5). Additionally, the feasibility of CO 2 bio xation is also examined and the relevant results are summarized, where a negative CO 2 exchange ux represents a desirable CO 2 uptake rate. We compared these best solutions obtained by FastKnock with the associated OptKnock results as well as experimental data available in the literature [71][72] [73]. Note that Optknock aims at, and terminates on, nding a single solution. Therefore, comparing it vs. FastKnock in terms of computational costs is not meaningful.
We found that a solution with the best production rate or an optimal solution of the optimization algorithms such as OptKnock does not necessarily bring the best SoGC and the other desired indexes. However, by identifying all the possible solutions for the problem, FastKnock allows a comprehensive analysis. For example, knocking out ADHEr, ATPS4r, and LDH_D is expected to lead to the best biomass formation rate (0.16 h -1 ) and the highest SoGC (3.01 h -1 ), which is twice the best SoGC provided by OptKnock solutions while the other indices corresponding to this knockout are comparable with the best numbers shown in the table (i.e., a production rate of 8.90 vs. 12.24 mmol.gDW -1 .h -1 , a yield of 0.89 vs. 1.22, an SSP 1.42 vs. 1.46 h -1 , and a CO 2 uptake rate of -8.76 vs. -9.36 mmol.gDW -1 .h -1 ). A relatively high value of SoGC can also be desirable from a dynamic perspective because it indicates that even under non-optimal conditions, the biosynthesis of the target biochemical is coupled with the growth of the production strain. This situation is usually encountered in batch and fed-batch cultivations in the logarithmic phase of growth.
A more striking example is the comparison between the PTAr, PYK, ATPS4r, and SUCD1i quadruple knockout identi ed by OptKnock with the two solutions with the best production rate (ADHEr, LDH_D, PFL, and THD2) and the best SoGC (ADHEr, LDH_D, HEX1, and THD2) identi ed by FastKnock. While the biomass formation rate of the FastKnock solutions (0.11, 0.13 h -1 , respectively) are comparable with the OptKnock solution (0.16 h -1 ), the yield and SSP is an order of magnitude higher for FastKnock solutions. A serious issue with this OptKnock solution is the very low SoGC (0.01 h -1 ), which indicates that the production rate would be hardly coupled with growth. In comparison, the predicted SoGC for FastKnock solutions are 2.85 and 3.09 h -1 , respectively. Another disadvantage of OptKnock solution is a relatively high CO 2 production rate of 9.03 mmol.gDW -1 .h -1 while in the FastKnock solutions the CO 2 uptake rates are -6.12 and -8.77 mmol.gDW -1 .h -1 , respectively.
Among the quintuple knockouts, the predicted SSP and SoGC for one of the FastKnock solutions (ADHEr, LDH_D, GLUDy, PFL, and THD2) are almost twice those of the OptKnock solution (ADHEr, LDH_D, PTAr, PYK, and GLCpts) while the other indices are comparable.

Comparing
FastKnock to MCSEnumerator (case study: ethanol overproduction in E. coli AF1260) As mentioned previously, MCSEnumerator is a novel method for metabolic engineering based on the identi cation of minimal cut sets [50]. This approach applies a ltering step to reduce the computation time, which allows the user to nd thousands (but not all) of the most e cient knockout strategies in genome-scale metabolic models. MCSEnumerator can be used to nd a large number of metabolic engineering interventions, but it has various drawbacks. In this section, we compare MCSEnumerator with FastKnock. To aid in this comparison, we consider the case study of ethanol production in E. coli iAF1260 GEM with an 18.5 mmol.gDW -1 h -1 glucose uptake rate under anaerobic conditions (iM9 medium) as presented in the MCSEnumerator publication.
We should discuss the effect of the MCSEnumerator thresholds on its solution set. It would not be feasible to apply MCSEnumerator using thresholds that are relaxed enough to nd all the solutions (Supplement H). We illustrate this with an example in Figure 3. The blue production envelope, which has the best SoGC value, is associated with a solution found by both MCSEnumerator and FastKnock. The associated solutions (with the red and green diagrams), which are the worst cases among the shown envelopes, were not found by MCSEnumerator because of the production threshold considered. This illustrates the e ciency of the primary ltration of the MCSEnumerator method. The starting point might not be the best factor for ltering appropriate solutions. For example, the minimum production rate based on the orange envelope is similar to the green envelope in Region Y3, which is below the threshold considered for ethanol production ux. Nevertheless, the orange envelope may still be associated with a proper solution due to its relatively high SoGC, but it was not found by MCSEnumerator.
Furthermore, the predetermined thresholds may lead to the fact that some of the solutions obtained by MCSEnumerator are not necessarily and truly minimal.
It means that an appropriate solution of cardinality (n) may exist and not be found while it appears in some higher-order solutions (>n) which contain irrelevant additional reactions.
While the MCSEnumerator algorithm and its modi ed editions may have shorter execution times, the number of solutions they can provide with certain settings is only a very small percentage of the total potential solutions. Therefore, comparing the MCSEnumerator and FastKnock algorithms based solely on execution time is not rational since these algorithms neither produce the same output nor have the same objective.

Discussion
Overproduction of biochemicals of interest coupled with signi cant growth rates might be optimistic and may not always be easily achievable due to e.g., competing pathways in a metabolic network [43]. This can lead to weak coupling especially under suboptimal growth conditions. Alternatively, strong coupling requires that production must occur even without growth [14]. Speci cally, product synthesis rate is said to be strongly coupled with biomass formation if the product yields of all steady-state ux vectors are equal to or larger than a prede ned product yield threshold [15]. Accordingly, SoGC is de ned as the square of the product yield per unit substrate divided by the slope of the lower edge of the production curve [55] (see Figure 2).
SoGC is a non-linear objective function and thus OptKnock and most of the in silico strain design methods cannot be used to nd knockouts with optimal SoGC. OptGene [37] is a heuristic approach that can be used to identify a single knockout strategy with optimal SoGC [55]. However, knocking out the single identi ed solution by OptGene may not be practically feasible e.g, due to the genes' loci. Therefore, identi cation of all knockout strategies by FastKnock is desired and provides the expert experimentalists with the opportunity to choose from a short list of knockout strategies that are ltered for a relatively high SoGC, SSP, yield, etc. This shortlist can be investigated for advantageous solutions in terms of environmental considerations such as CO 2 bio xation [71][72], minimal production of undesired or toxic byproducts, practicality of knocking or silencing genes, etc ( We proposed an e cient next-generation algorithm, FastKnock, which identi es all proper reaction or gene knockout strategies for the overproduction of a desired biochemical. We reached this goal by signi cantly pruning the search space without omitting any solutions. For example, in our experiments, FastKnock was required to explore only 1% of the search space in the pruned model when identifying all triple-knockout strategies. The rate of this reduction increases as more reactions are knocked out (e.g., about 0.1% for quadruple-knockout strategies and about 0.01% for quintuple-knockout strategies) ( Table 3).
This drastic reduction of the search space enables our novel FastKnock method to nd the set of all possible solutions in a feasible time duration.
Finding the best and most suitable trade-off between cellular growth and the production of the desired biochemical is one of the key bene ts of FastKnock results. Moreover, determining all possible solutions allows for the selection of the most appropriate strategy based on any desired evaluation index, including product yield, SSP, and SoGC (Table 4 and Table 5). This is an important and useful feature of our search strategy, especially for practical applications [59].
We compared FastKnock to MCSEnumerator [50], which has been shown to nd more e cient solutions than the MCS methods [76] [77][78]. We found that the solutions identi ed by MCSEnumerator may not be minimal. Also, due to initial ltering, MCSEnumerator misses solutions that may be practically more appropriate than the best solutions it nds. In comparison, FastKnock identi es all minimal solutions, which can be mined later based on any desired criteria.
When all solutions are available, one interesting analysis that can be conducted is to identify the reactions or genes that are common among a relatively large number of solutions. For instance, in the case of iJR904, to produce succinate in iM9 under anaerobic conditions (CM2), about 70% of solutions include at least one of ADHEr or PFL reactions (Figure 4). Moreover, when three or more reactions are to be deleted, the best results in terms of the succinate production rate include both ADHEr and PFL (Table 4). Collectively, this analysis suggests that ADHEr and PFL reactions support pathways that compete with succinate production, and these pathways are blocked when ADHEr and PFL are eliminated [79][80]. Based on this analysis, we suggest using a heuristic for higher-level knockout combinations in which one or more reactions (e.g., ADHEr or PFL) are removed in searches for six or more knockouts. In this way, one would need to search for fewer reactions to knockout. We believe this heuristic would reduce the search space by an order of magnitude at the expense of losing not more than half of the solutions.

Conclusion
While in silico results do not necessarily lead to in vivo overproduction, obtaining all possible knockout strategies is critical for determining the best practical and most e cient strategy. The FastKnock algorithm is a general framework that can be used to overproduce any metabolite. It is not limited by factors such as complexity of the cultivation conditions or large size of the metabolic network of the desired strain. FastKnock identi es strategies with a production rate higher than the desired threshold determined by the user.

Declarations
Author summary Metabolic systems biology aims to computationally analyze metabolic networks at systems level to comprehensively delineate their potential and probable functionalities. This approach is bene cial to metabolic engineers for nding appropriate rewiring targets and designing production strains with favorite metabolic behaviors. Recent in silico strain design approaches can identify several candidates and provide us with the opportunity of selecting the potentially operative solution among the identi ed candidates. Here, we present FastKnock, an e cient next-generation algorithm for effectual design of growth-coupled production strains. Among the existing algorithms, FastKnock is the only one capable of providing all possible solutions for multiple gene and reaction knockouts to overproduce a desired (bio)chemical. We achieve this by developing a special depth-rst traversal algorithm that allows us to signi cantly prune the search space. The identi cation of all the solutions allows metabolic engineers to additionally evaluate, rank, and choose the most appropriate intervention strategies. FastKnock's users can consider various criteria in a post-processing phase and examine the solutions without repeating the algorithm execution. Applying FastKnock to overproduce a number of biochemicals in E. coli genome-scale metabolic models, we found solutions that have not been reported by using state-of-the-art approaches.
Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials All data generated or analyzed during this study are included in this published article [and its supplementary information les]. Our implementation of the FastKnock method in Python is publicly available at https://github.com/leilahsn/FastKnock.

Competing interests
We declare that the authors have no competing interests as de ned by BMC, or other interests that might be perceived to in uence the results and/or discussion reported in this paper.         Figure 1 The traversal tree. All possible solutions are identi ed through a depth-rst traversal of the tree. First, the identifyTargetSpace function is applied in the root node to the reduced wild-type network to determine the target space. Each reaction in this set is individually selected and removed from the network in Level 1. For each deleted reaction (or equally node) in Level 1, the identifyTargetSpace function is recalled to obtain the target space for the next level. For simplicity, we show only two levels of the traversal of the tree, which is enough to identify all single and double deletions. Five exemplar production envelopes for strategies identi ed by FastKnock for ethanol production in iAF1260, which is partitioned into four regions based on the growth rate (x axis) and the production ux (y axis) as in [15]. The horizontal dashed line indicates the threshold for production rate considered in [15], and the vertical dashed line indicates the growth rate threshold. SoGC, product yield (Yp/s) and SSP of the quadruple knockout strategies are shown in the top right legend. Unlike FastKnock, MCSEnumerator nds none of these strategies except the one shown in blue.

Figure 4
The rate of presence of the ADHEr and PFL reactions in all possible solutions counted in Table 4 for succinate production.