Genome-scale top-down reduction of phages to generate viable minimal phage genomes

Reduction of tailed-phage genomes to generate viable minimal genome phages is important for expanding our understanding of phage biology, providing insights for phage synthetic biology. Many efforts have been made to minimize living cells, but such work remains a challenge for phages due to the extraordinary genomic diversity and lack of genome-scale editing techniques. Here, we developed a CRISPR/Cas9-based iterative phage genome reduction (CiPGr) approach to detect the nonessential gene set of phages and minimize phage genomes. By CiPGr, inactivated genes accumulated on the phage genome, and mutant progeny with robust growth gradually arose, eventually becoming predominant in the populations. CiPGr was applied to four distinct tailed phages (model phages T7 and T4; wild-type phages seszw and selz), resulting in mutants of these phages with deletion of 8–20% (3.3–33 kbp) sequences, and leading to minimal genomes. Metagenomic sequencing of the mutant phage populations generated showed that 46.7 to 65.4% of genes of these phages were removed. Loss of some genes (39.6%-50%) in the removable gene sets was likely severely detrimental to phage growth. This made the corresponding mutant progenies recede in the populations, leading to the failure of detection of these genes in the genomes of the isolated mutants. In summary, our results for these four distinct tailed phages demonstrated that CiPGr is a generic yet effective approach suitable for use in novel phages without prior knowledge.

generated showed that 46.7 to 65.4% of genes of these phages were removed. Loss of some genes (39.6%-50%) in the removable gene sets was likely severely detrimental to phage growth. This made the corresponding mutant progenies recede in the populations, leading to the failure of detection of these genes in the genomes of the isolated mutants. In summary, our results for these four distinct tailed phages demonstrated that CiPGr is a generic yet effective approach suitable for use in novel phages without prior knowledge. Background Bacteriophages (phages) are the most abundant 1 and genetically diverse 2-4 biological entities on earth.
For more than a century, phage research has been critical to numerous fundamental biological discoveries 5,6 and has provided important biotechnical tools in molecular genetics [7][8][9] . Recent advances in viral metagenomic sequencing have discovered a great number of phage sequences in the human gut 2,3 and various other environments 4 . The majority of these sequences (> 75%) are new 4,10 , and over 95% belong to tailed phages with a double-stranded (ds) DNA genome 11,12 . The novelty of tailed phages limits our understanding of phage biology and their roles in the ecosystem 4 . In addition, phages have been recognized as natural antimicrobial agents for treating bacterial infections 13,14 . Some major concerns, e.g., extremely high host speci city to a small set of bacterial strains and the rapid emergence of bacterial resistance, remain regarding the use of wild-type (WT) phages in clinical phage therapy 15 . Therefore, many efforts have been made to genetically engineer phages to overcome these limitations 16,17 .
Phage synthetic biology can integrate functional genes or gene circuits into phage genomes, bolstering their antibacterial activity and potential in various bioengineering applications 16,17 . This integration can be challenging because of the limited DNA encapsulation capability of viral particles 18,19 . The accessory genes of phages may help phages better adapt to a wide range of ecological niches but may be nonessential for phage development under given conditions 20 . A reduction in these nonessential genes can create some space in phage genomes. This raises questions regarding the extent to which a minimal viable phage genome can be generated for a given phage, especially for novel phages with plentiful unknown functional genes 4,10 . The removal of redundant genes from the phage will help advance the understanding of the phage-host interaction pattern and the phage lifestyle. This understanding can further enable the redesign of a phage genome with more functions 21,22 , paving the way for fundamental biological discoveries.
Phages are among the simplest biological systems. Nontailed phages, e.g., M13 23 and phi X174 24,25 , have very compact genomes (<10 kb) encoding a limited number of genes, making it easy to understand the functions of the genes. Tailed phages often have relatively large (14 kbp to 500 kbp) 13,14 and extraordinarily diverse genomes 26 , implying that these phages likely employ complex and heterogeneous propagation mechanisms within their host cells. It remains challenging to obtain a viable minimal genome of tailed phages on a large scale. One challenge is the identi cation of the nonessential genes of a given phage 27 . There are no e cient approaches to identify the nonessential genes of phages on a genome-wide scale. A summary of the nonessential gene set of phage T7 (25 genes, 41.7%) was obtained from accumulated knowledge derived from numerous studies conducted in recent decades 28,29 .
These genes are interspersed throughout the genome, hindering efforts for genome reduction of phage T7. Approaches widely used in genome reduction for living cells, i.e., Escherichia coli 30 , yeast 31 , Bacillus subtilis 32 , and Mycoplasma mycoides 27 , might not be suitable for application in tailed phages on a large scale due to their exclusive characteristics of self-propagation relying strictly on their hosts. Moreover, the bioinformatic method for comparing complete bacterial genomes to identify the minimal gene set for cellular lives 33 often fails for phage genomes because of their highly divergent nature 4 . Recently, the clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system has been employed for phage genome editing [34][35][36][37] . Nevertheless, the stepwise deletion of many genomic regions is laborious and time consuming, especially if simultaneous screening of mutant phages that are capable of maintaining robust growth is needed. J. Craig Venter and colleagues placed M. mycoides JCVI-syn1.0 genes into three categories, namely, essential, quasi-essential, and nonessential genes, by transposon mutagenesis.
Quasi-essential genes are not critical for viability but are nevertheless required for robust growth 27 . This is likely true for phage genes, representing another challenge for screening mutant phages with viable minimal genomes capable of robust growth from numerous heterogeneous mutants 38 .
To address these challenges, we report a top-down genome reduction approach termed CRISPR/Cas9based iterative phage genome reduction (CiPGr). In the CiPGr process, host bacterial cells harboring a dual plasmid system consisting of pTarget with heterogeneous spacers 39 and spCas9 are infected by a speci c phage. During phage infection and injection of DNA into the cell, the single-guide RNA (sgRNA, encoded by pTarget) leads the Cas9 nuclease (encoded by spCas9) to cleave the gene at the designed site, and subsequently, the break is repaired via homologous recombination (HR) induced by the HR sequence (pTarget), causing gene deletion or disruption 34 . If the gene is not essential for phage growth, the mutant phage progeny can multiply without the gene. The mutant phage population is continuously transferred to fresh cells harboring the dual plasmid system, and gene mutations accumulate in the phage genome. The minimal-genome-harboring phage with growth advantages will be predominant in the population and nally selected. We assessed the use of CiPGr on four diverse tailed phages (T7, T4, seszw, and selz) and obtained removable phage gene sets and phages with viable minimal genomes from the mutant populations, demonstrating the high value of CiPGr in phage synthetic biology research.

Development of the CiPGr approach
The CiPGr approach was performed according to the scheme presented in Fig. 1. First, to engineer the phage genes of interest, the CRISPR spacer and HR templates were designed in a single oligonucleotide (CiPGr cassette). The 200-bp CiPGr cassette harbored a 20 bp spacer sequence, 36 bp promoter sequence, and 100 bp HR template sequence with 50 bp arms (Fig. 1a). We added barcode (20 bp) and primer (24 bp) sequences to the ends. This design ensured that the cassette sequences could be synthesized on a DNA chip. Second, the synthesized CiPGr cassettes for different phages were separated by the primer and the barcodes and integrated into the plasmid pTarget (Fig. 1b). The CiPGr pTarget plasmids were pooled into 8 libraries and transformed into E. coli (spCas9) or Salmonella enterica (spCas9) cells, generating CiPGr plasmid library cells. Third, the phage genome reduction process was started by adding the 4 WT phages (T7, T4, seszw, and selz) into the culture medium of the corresponding CiPGr plasmid library cells (Fig. 1c). Due to the heterogeneous pTarget plasmids in each library, various mutant phages were generated. These mutant phage populations were continuously transferred to the culture of fresh CiPGr plasmid library cells, and gene deletions or disruptions accumulated in the mutant progenies with a series of transfers. The mutant phages in the populations were expected to have varying tness in growth, and consequently, mutants with a growth advantage would become gradually predominant in the population after a series of transfers.
We applied CiPGr to four different tailed phages: T7, T4, seszw, and selz ( Supplementary Fig. 1). T7 (39.9 kbp) and T4 (168.9 kbp) are model phages that speci cally infect E. coli strains and belonging to the families Podoviridae and Myoviridae, respectively. Phages seszw (45.8 kbp) and selz (154.8 kbp) were newly isolated from sewage water samples using S. enterica (ST56) as the host (Supplementary Table 1 and Supplementary Table 2). Phage selz was classi ed into the family Myoviridae 40 . According to the phylogenetic tree, genome comparison, and transmission electron microscopy (TEM) features ( Supplementary Fig. 2), seszw was classi ed into the family Siphoviridae. We designed two types of CiPGr cassettes per gene to disrupt the gene by deletion of a 100 bp fragment (cassette-100) and deletion of the whole gene (cassette-gene). Cassette-100 and cassette-gene used the same spacer but different HR templates. Our initial attempt to design cassette-100 was to disrupt the phage genes without reducing the genome size too much. We selected eight spacers for each gene due to the likely varying on-target activities of different sgRNAs, and these spacers spanned the entire gene, ensuring successful gene editing. For some genes, fewer spacers were selected owing to the short gene sequences or few predicted spacers. We did not design CiPGr cassettes targeting the annotated structural genes and the essential genes con rmed in previous reports 17,25 or the likely regulator elements.
In total, we designed and synthesized 5,828 unique CiPGr cassettes targeting 926 genes of the four phages with an average of 5-6 cassettes per gene. Cloning and delivering the pooled CiPGr plasmid libraries into E. coli (spCas9) or S. enterica (spCas9) yielded 2×10 5 to 2×10 6 colonies and achieved 176 to 5865 colonies per individual plasmid in the libraries (Supplementary Table 3). We assessed the CiPGr plasmid libraries by next-generation sequencing (NGS) and observed that 42.38% (±2.98%) of the cloned CiPGr cassette sequences in the libraries shared 100% identity with the designed sequences. These correct sequences showed that each of the eight libraries captured 97.34% (±3.36%) of the designed cassette sequences, covering 99.33% (±0.98%) of the genes of interest, thus enabling the completeness of our genome-wide engineering (Supplementary Table 4 and Supplementary Fig. 3). Some designed CiPGr cassette sequences (218, 3.74%) were not detected in the libraries, likely due to the low sequencing depth or high error rates in DNA synthesis. Next, the four WT phages were used to infect the CiPGr plasmid library cells (Fig. 1c). Serial transfer of the mutant populations of the four phages in the eight libraries was carried out 300 to 600 times, corresponding to over 2000 generations of the phage life cycle (Supplementary Table 5

Detection of the removable gene sets of the phages
We initially assume that the removal of some genes from these phage genomes would not impair the phage life cycle but could likely have different impacts on their growth tness. To detect the removable gene sets of the phages, we performed metagenomic sequencing of the mutant phage populations generated during serial transfer. In examining the metagenomic sequencing reads, the results obtained from the two types of plasmid libraries (cassette-gene and cassette-100) demonstrated that the number of disrupted and deleted genes as designed by CiPGr was 28 (46.7%) in T7, 52 in seszw (64.2%), 120 in T4 (43.3%), and 96 in selz (47.8%). In addition, the loss of large fragments or point mutations was also detected in the genomes of the mutants of the four phages, making the number of deleted or disrupted genes increase by 0 in T7, 1 in seszw, 19 in T4, and 17 in selz (Fig. 2a). These losses and mutations were not designed in this experiment, and some of them can likely be explained by the CRISPRescape mutations (CEMs) of phages 41 Table 7). Moreover, 1322 one-or two-base deletions were detected in the metagenomic sequencing reads in the T4 mutant populations, causing the disruption of 54 genes.
The removable gene sets of these phages detected by the two types of libraries were mostly consistent ( Fig. 2b and Supplementary Table 7). For both the T7 and seszw phages, the removable gene sets exhibited high correlation (r=0.73, T7; r=0.73, seszw) in the frequency of gene deletion or disruption in the populations between the two types of CiPGr plasmid libraries (cassette-100 and cassette-gene) (Fig. 2b). For the T4 and selz phages, the correlations of the removable gene sets between the two types of libraries (r=0.49, T4; r=0.63, selz) were lower than those of T7 and seszw (Fig. 2b). Considering that the phage genome reduction caused by the cassette-100 plasmid library was less than that caused by the cassette-gene plasmid library, these results suggest that the genome size did not affect our experiment. More importantly, we did not observe a correlation between the frequency of gene mutation and the abundance of the corresponding CiPGr cassette in the plasmid library ( Supplementary Fig. 5a), implying that the frequency of gene deletion or disruption did not depend on the abundance of the corresponding CiPGr cassette in the plasmid library.
In examining the occurrence of gene deletion or disruption across serial transfers, we observed that the number of removable genes increased by 18.5% for T7, 8.16% for seszw, 57% for selz, and 78% for T4 from the 20 th serial transfer to the last. Moreover, in the last ~100 transfers, the number of removable genes increased by only 0% for T7, 6.1% for seszw, 7.6% for selz, and 10.8% for T4 (Fig. 3), implying that the removable gene sets detected by CiPGr were becoming saturated. When taking only the genes that we designed for deletion and disruption into account, we obtained a similar result as above ( Supplementary  Fig. 4b). The majority of the removable genes of T7 and seszw (81.5% of T7 and 91.84% of seszw) were detected in the early transfers (T20 of T7 and T32 of seszw), while the number of removable genes of T4 and selz gradually increased with the increase in the number of transfers. This may have been caused by the poor cleavage e ciency of the CRISPR system due to DNA modi cation [43][44][45] . Nevertheless, our results demonstrated the versatility of CiPGr in most cases in detecting removable gene sets of different tailed phages.

Phenotype of the mutant phages
To determine the impacts of gene deletions on the phage phenotype, we rst determined the infection dynamics of these mutants on a large scale to evaluate their capability to kill the original bacterial cells (E. coli strain MG1655 and S. enterica (ST56) without the dual plasmid system). According to the removable gene sets, here, we mainly focused on the mutant phages of T7 and seszw generated in the cassette-gene plasmid libraries. Eight large-and eight small-plaques ( Supplementary Fig. 6a) were picked and puri ed from double-layer agar cultures of the mutant phages every ten transfers, resulting in 480 mutants for each phage from the populations of over a total of 300 transfers. Infection dynamics indicate that the deletion of different genes can deteriorate the capability of phages to kill their hosts to various degrees. We observed that the performance of the large-plaque mutants (kill-time of T7 mutants: 66.5±10 min; seszw mutants: 270±76 min) was mostly better than that of the small-plaque mutants (T7 mutants: 108±36 min; seszw mutants: 240±76 min) in terms of killing their hosts (Fig. 4a). Moreover, the capability of both large-and small-plaque mutants of T7 and seszw became weaker with the increase in the number of transfers (kill-time of T7 mutants: 40 to 250 min (WT, 50 min); seszw mutants: 340 to 90 min (WT, 320 min) (Fig. 4a). Only a small number of the mutants acquired an increased capability to kill their hosts (17.9% of T7; 1.5% of seszw) compared to their parental phages.
For T4 and selz, we picked and puri ed eight plaques every 100 transfers (T4, from the 100 th transfer to the 600 th ; selz, from the 100 th transfer to the 400 th ) and performed the infection dynamics assays. T4 mutants showed a similar trend as the T7 mutants (Fig. 4a), in which the capability to kill host cells became weaker (from 80 min to 250 min) with the increase in the number of transfers, but a high ratio (25%) of the mutants showed an increased capability (kill-time less than 115 min) to kill cells compared to WT T4 (Fig. 4a). Interestingly, gene deletion in the phage selz lengthened the kill-time (OD 600 < 0.4) from 235 to 345 min (Fig. 4a). These results demonstrate that the impacts of phage genome reduction on phage growth were heterogeneous but mostly detrimental to phage growth.

Correlation of phenotype and genotype of the mutant phages
To link the genotype of the mutant phages with their phenotype, we next sequenced 60 mutant phages selected from the 480 sampled plaques of T7 and seszw. For each sampling point, we selected the mutants with the strongest and weakest host killing capability regarding their killing curves ( Supplementary Fig. 6b). When comparing these 60 mutant phage genomes of T7 ( Fig. 4b and Supplementary Fig. 7) and seszw ( Fig. 4b and Supplementary Fig. 8), we observed that gene deletions accumulated on the genomes of the T7 and seszw mutants with the increase in the number of transfers, as expected. On the genomes of the T7 mutants, 9 genes (2 kbp) were deleted in 1-100 transfers, 1 gene (0.1 kbp) in 100-200 transfers, and 2 genes (1.1 kbp) in 200-300 transfers; on the genomes of the seszw mutants, 13 genes (4.7 kbp) were deleted in 1-100 transfers, 8 genes (1.9 kbp) in 100-200 transfers, and 5 genes (2.5 kbp) in 200-300 transfers. Plotting of the number of serial transfers versus the numbers of deleted genes (T7: R 2 > 0.87, seszw: R 2 > 0.71) and the mutant phage genome sizes (T7: R 2 > 0.75, seszw: R 2 > 0.66) yielded rarefaction curves (Fig. 4c, d). The T7 mutants with the strongest and weakest host killing capabilities showed a highly similar rate of gene deletion from their genomes with transfers. Thus, we could extrapolate from these observations (Fig. 4c, d) that the number of deleted genes approached a maximum in the T7 genome, likely generating a viable minimal phage genome among the mutants. In the case of seszw, the number of deleted genes continued to grow, implying that more genes can be deleted if more rounds of CiPGr are conducted.
Next, by examining the genomes, we selected ve mutant phages with a large plaque of T7 and seszw, and the gene deletions accumulated on the genomes of the mutant phages with the increase in the number of transfers (Supplementary Fig. 9). One-step growth curves of these mutants showed that as more gene deletions occurred in the phage genome, fewer mutant progenies were generated per cell ( Supplementary Fig. 10). The number of mutant progenies per infected cell (burst size) decreased from 134 to 3 for T7 mutants and from 44 to 9 for seszw mutants. (Fig. 5a). Moreover, with the deletion of the genes, the relative tness of T7 mutants decreased from 31 to 11.5 doublings per hour, and that of seszw mutants decreased from 9.3 to 5.5 doublings per hour (Fig. 5b). These results indicate that gene deletions in T7 and seszw can cumulatively inhibit progeny generation.
We also sequenced the genomes of the mutant phages with the strongest host killing capability selected from the population of the transfer of T4 (the 583rd transfer) and selz (the 232nd transfer) based on their host kill curves and determined their one-step growth curves. Compared to the WT parental phages, the burst sizes decreased slightly from 53 to 31 for the T4 mutant and from 35 to 25 for the selz mutant (Fig.  5a). The genomes of the Myoviridae phages T4 and selz are approximately 3-4 times larger than those of the phages T7 and seszw. The T4 and selz genomes likely acquire and carry more accessory genes. The loss of these genes detected in this study may not be su cient to greatly impact phage growth 31,32 . Overall, the mutants of these four different phages showed a similar trend: with the increase in the number of transfers, the bacteriostatic ability of the mutants became weaker, and the burst size decreased. In particular, for the phages T7 and seszw, the parental genomes were nearly optimal in killing their hosts. This observation regarding T7 is consistent with a previous result obtained by simulation 46 .
To detect the effects of phage genome reduction on phage particle size, the particles of the parental T7 and seszw and their mutant progenies (T7 300L and seszw 300L) with minimal genomes and the largest plaque from the population of the 300 th transfer were observed using TEM. The results showed no signi cant changes in particle size caused by genome reduction for either phage (Supplementary Fig.  11). Next, we integrated 1.5 kbp and 2.5 kbp yeast genome fragments (no known gene functions) into the genome of T7 300L and found that its tness remained unchanged (Fig. 5b), implying that instead of the alteration in the genome size, the gene loss was likely the important factor affecting phage growth in this study.
In addition, the one-step growth curves of the mutants of the four phages showed that the life cycle duration of the mutants (~17 min for T7, 50 min for seszw, 40 min for T4, and 60 min for selz) remained unchanged with the decrease in genome size (Fig. 5a). This observation is consistent with the conclusion of a previous study showing that holin controls the length of the infective cycle for lytic phages 47 .
In summary, our experiment indicates that the loss of some genes in the removable gene sets of phages can greatly impair phage growth. These genes likely help the phage adapt to host metabolism, transcription, and translation and turn the host into a phage reproduction machine 48,49 , but most of them encode unknown functions, necessitating further investigation.

Capability of CiPGr to generate minimal phage genomes
Here, according to the study of minimal genome cells 21 , we de ned the minimal phage genome for a given tailed phage that must contain only the genes essential for self-reproduction in its host 11 . We thus can presumably create viable minimal phage genomes by in silico deletion of the removable gene sets and the intergenic region between two adjacent removable genes from the phage genomes, resulting in  Table 6 and Supplementary Table 7). The majority (T7 100%, T4 92.4%, seszw 94.1%, and selz 98.4%) of these gene deletions that were detected in the mutant phage genomes showed a relatively high frequency (>5%) across the mutant phage populations in the serial transfers ( Fig. 2b and Supplementary Table 7). This suggests that these genes are less important for phage growth than other genes in the removable gene sets. Thus, according to our results, we can tentatively categorize the phage genes into three groups: (i) genes that were absent in the isolated mutant phage genomes and had relatively high frequencies of gene deletion, over 5%, in the mutant phage populations were classi ed as nonessential. (ii) Other genes in the removable gene sets (frequency <5%) were classi ed as quasi-essential. The loss of these genes would generate defective phages with fewer growth advantages than mutants lacking nonessential genes. (iii) Genes that were not detected in the removable gene sets were essential for phage growth. Therefore, CiPGr could reduce the phage genome by deleting nonessential genes and, in theory, generate the minimal viable phage genome (Fig. 5b and Supplementary Table 7).
Comparison of the mutant phage genomes, namely, T7 300L with 36,937 bp (8.4% of the genome deleted), seszw 300L with 36,655 bp (20.1% of the genome deleted), T4 583L with 151,790 bp (10.1% of the genome deleted, 41 genes acquired a premature stop codon), and selz 232L with 121,813 bp (21.3% of the genome deleted) were determined to be the minimal viable phage genomes in this study (Fig. 5b and Supplementary Table 8), although it can be extrapolated from the above trend that with additional rounds of CiPGr (Fig. 4b,c,d), a smaller viable phage genome with lower tness could be obtained. This indicates that in a given condition, a balance needs to be determined between genome reduction and the growth tness of phages 18,40 . In conclusion, these minimal genome mutants will nevertheless be helpful in further applications and contribute to the understanding and redesigning of phage genomes.

Discussion
To our knowledge, no literature pertaining to the genome reduction of a given tailed phage to generate the viable minimal phage genome has been reported. The results obtained from numerous studies on the minimal genome of free-living cells have expanded the understanding of fundamental biological principles, highlighting the great interest in the minimal genome of tailed phages because their extraordinary genetic diversity implies that tailed phages employ complex and heterogeneous propagation mechanisms within their host cells. By CiPGr, for the rst time, we demonstrated the likelihood of compacting the genomes of different tailed phages to generate viable minimal phage genomes. The discrepancy between the removable gene sets detected by metagenomic sequencing of the mutant phage populations and the gene sets absent from the isolated mutant phage genomes suggests that these genes are of different importance for phage growth. The isolated mutant phages showed that genome reduction is mostly detrimental to phage propagation to varying degrees. Further investigations on the removable gene sets and the minimal genome mutant phages capable of maintaining robust growth will likely help us deeply understand the biological principles of tailed phages, paving the way for the redesign of phages for various applications.
The CRISPR/Cas9-assisted HR method has been used to successfully edit several phage genomes [34][35][36][37] . These studies highlighted multiple factors that should be taken into account before attempting to edit phage genomes by this tool, such as e cient spacer selection or extensive base modi cation 35,45 . These factors may adversely affect the editing e ciency of the tool. Due to the poor capability of software programs in evaluating and predicting e cient spacers for phage genomes 50 , we directly designed multiple spacer sequences for each gene of interest to avoid editing failure. Our results illustrate that in the iterative CiPGr process, the majority of the genes of interest are disrupted or deleted from the genomes of the phages T7 (28/36, 77.8%), seszw (52/60, 86.7%), T4 (120/188, 63.8%), and selz (96/179, 53.6%) (Fig. 2.c), showing the feasibility of our design. Nevertheless, editing was not detected on some genes. We conjecture that this may have three different reasons. First, the corresponding designed spacers could be invalid. We calculated the number of designed spacers and cassette abundance in the libraries for the genes that were not edited ( Supplementary Fig. 5c). The results showed that these genes had an average of 6, 5, 6, and 6.5 spacers per gene and 589, 154, 597, and 326 cassettes for T7, T4, seszw, and selz, respectively, con rming the occurrence of gene editing (Supplementary Table. 7). Second, base modi cations could be present in the designed sites. It has been shown that the Cas9 nuclease can act on the modi ed T4 genome, although the e ciency is lower than that for the unmodi ed genome 35 . We examined the genomes of the other three phages but failed to nd any known genes encoding the enzymes related to modi cation akin to the T4 genes of cytosine hydroxymethylase (g42) and dCTPase (g56). Consistent with previous results 35 , the gene deletion or disruption e ciency for T4 was much lower than that for T7 ( Fig. 2g and Fig. 3). Nevertheless, the percentage of the removable genes of T4 (50.2%) was not very different from that of the other three tested phages (T7 46.7%, seszw 65.4%, and selz 56.2%). This suggests that the genome modi cation effect was negligible in the CiPGr process for determining the removable gene sets of phages. Third, and most likely, the genes could be essential to these phages, although we were not able to completely exclude the above two reasons. For example, in this study, we failed to detect the ve genes (gp0.3, 1.6, 1.8, 6.3, and 6.5) in the removable gene set of T7, which used to be considered nonessential 29 . This was apparently not caused by the corresponding cassettes in the plasmid libraries (the abundance of the cassettes, log 2 >5, Supplementary   Fig. 5c). A likely explanation is that these genes are important for T7 growth in MG1655 (for example, gp0.3 acts to overcome the DNA restriction system of the host 51 ), and the loss of these genes can severely impact the growth tness of T7 within its host, consequently leading to the rapid recession of the corresponding mutants in the populations. Thus, we can extrapolate that the hypothetical genes that are resistant to deletion in our study must play a critical role in phage growth (Supplementary Table 7).
The iterative process ensures that the mutants with a great advantage in robust growth will be predominant among the mutant population and are thus preferentially selected from the numerous heterogeneous mutants for the next round of CiPGr. Therefore, the order of the genes absent from the mutant phage genomes likely suggests that the genes are of different importance to their phage growth, i.e., the earlier a gene was deleted or disrupted, the less important the gene was to phage growth. This was evident in the metagenomic sequencing data of the mutant populations (Fig. 2a) and the genomes of the isolated mutant phages ( Supplementary Fig. 7, 8, and 9b), as indicated by the frequencies of 64.86% (±2.92%) of the removable genes of the four phages increasing across the populations as the number of transfers increased ( Fig. 2a and Supplementary Table 7).
The process of CiPGr resulted in thousands of generations of mutant phage progeny (Supplementary Table 5). A previous study indicated that Cas9 can cause rapid evolution of phage T4 mutants in a short time 45 . Unexpectedly, a number of point mutations across the whole genomes of the isolated phage mutants were observed. The number of point mutations was counted in the mutant phages of T7 (60) and seszw (58), and we found a linear relationship between the number of transfers and the number of cumulative point mutations in the mutant phage genomes (Supplementary Fig. 12a). These mutations likely cannot be attributed to the ECM only, as 23.5%±8.3% (T7) and 12%±7.7% (seszw) were located in the spacer and PAM regions, and most of them were spontaneous mutations 52 . Moreover, 90%±6% (T7) and 86.7%±6.9% (seszw) of these point mutations were missense mutations, leading to the corresponding amino acid changes. Due to the positive selection exerted by CiPGr on the most t mutants, these mutations can likely bene t the growth of the mutant phages to some extent, as we showed that the ratio of nonsynonymous to synonymous substitutions (Ka/Ks, including only the genes without designed spacers) of most mutant phages (94.4% of T7 and 76.9% of seszw) was greater than 1 (2.89±1.66 of T7, 1.82±0.85 of seszw) (Supplementary Fig. 12b).
Although a great number of mutant phages with different deleted or disrupted genes were generated in the process of CiPGr, we failed to correlate any mutant genes with the phenotype of the corresponding mutant phages. For example, the genome of seszw 210L had 6 genes more than that of seszw 290L, but their tness did not differ much (7.12±0.19 for 210L and 7.09±0.40 for 290L). In addition, 300L and 300S of T7 had the same subset of genes deleted. However, 9 unique point mutations were found in the coding region of 300S, and the kill time changed from 80 min (300L) to 190 min (300S). Therefore, the phenotype of a mutant phage is likely the result of cumulative effects of the mutant genes and point mutations on the genome. The exact function of each gene in the removable gene set needs further investigation.
In conclusion, CiPGr is a top-down phage genome reduction approach that successively eliminates nonessential genes to generate a viable minimal phage genome without prior knowledge about the phages. We believe that the minimal genomes generated in this study are not the ultimate minimal genomes for the parental phages, since this approach cannot delete all nonessential regions from the phage genomes, but the information obtained can help us redesign the minimal genome through the bottom-up approach of chemical synthesis. Nevertheless, this study demonstrated the usefulness and convenience of CiPGr as a powerful approach in phage biology studies by determining nonessential gene sets of phages and obtaining viable minimal phage genomes.

Declarations
Materials and Methods (see Supplementary information)