Viral contig acquisition and chitosanase AMG detection. The Integrated Microbial Genomes and Virome (IMG/VR) database (v3.0)41was screened for sequences corresponding to predicted chitosanase genes. Viral contigs with genes annotated by a chitosanase HMM (pfam07335) were first identified by applying a JGI viral detection pipeline42. For a more conservative functional assignment, the viral chitosanase sequences were further checked against annotation databases including EggNOG43, the carbohydrate-active enzyme database (CAZY)44 and the functional ontology assignments for metagenomes database (FOAM)45 using hmmsearch (Hmmer v3.1b2)46 as described previously1 and searching for sequence similarities to NCBI chitosanases using blastp47. The putative viral chitosanases were then screened against a profile of lysozyme HMMs to remove the mis-annotated lysozymes (PF13702, PF00959, PF04965, PF18013, PF00062 and a self-curated lysozyme HMM1 using the lysozyme sequences deposited at NCBI viruses (accessed on 16 November 2020).
For a confident assignment of the chitosanase genes as viral AMGs, the genomic content of the viral contigs carrying chitosanase genes screened from the above steps were inspected. Genes from viral contigs were predicted and translated using Prodigal48. The protein sequences were annotated by EggNOG bacterial and archaeal databases and three viral databases as previously described1,49, in addition to the 7185 microbial-specific and 8773 viral-specific HMMs implemented in checkV (v0.7.0)50. The chitosanase AMG candidates were classified into five categories according to their gene positions on viral contigs and presence or absence of viral hallmark genes as described previously1. Only viral contigs with high confidence scores (categories 0-2) for chitosanase AMGs were retained for subsequent analyses (Supplementary Information Table 1).
Viral contig clustering and host prediction. The viral contigs with chitosanase AMGs were clustered with Viral RefSeq genomes (v201) based on a scored protein sharing matrix. A clustering network including pairwise interactions was generated by applying vConTACT using default parameters (v2.0.9.10)51. The soil viral contigs did not share sufficient genes with previously deposited reference viruses to enable a confident taxonomic assignment (data not shown).
The putative hosts of the viral contigs that carried chitosanase AMGs were predicted using three published bioinformatic tools: 1) WIsH52 (best-hit), 2) VirHostMatcher53 (best-hit) and 3) Prokaryotic virus Host Predictor (PHP)54 (‘consensus’). The final host taxonomy of a viral contig was assigned when results from at least two of the three tools reached consensus.
Phylogenetic analysis of chitosanases. To delineate the phylogenetic relatedness of the detected viral chitosanases to GH75 chitosanases in other taxa, a phylogenetic tree was constructed based on multiple sequence alignments of protein sequences of archaeal, bacterial, fungal and viral chitosanases. The tree was re-rooted using a bacteriophage lysozyme (YP_006987285.1). In order to cover the diverse genetic space across all domains of life, we first queried ‘chitosanase’ from NCBI protein database (https://www.ncbi.nlm.nih.gov/protein, accessed on Oct 11th, 2021) and further screened by the GH75 chitosanase pfam (PF07335). Sequences of the bacterial and fungal GH75 chitosanases used to identify key residues in the active sites were also included as part of the reference13,14. The reference sequences were then clustered at 70% amino acid identity to remove redundancy using CD-HIT (v4.8.1)55 and the representative sequence of each cluster with length longer than 150 amino acids was included in the final reference set, resulting in two sequences from archaea, 230 from bacteria and 180 from fungi. The viral chitosanases and the reference sequences were aligned using MAFFT with default parameters (v7)56. The multiple sequence alignments were manually inspected and adjusted based on positions of the four key residues of the predicted active site across the viral and reference sequences (Supplementary Information Figure 1). The phylogenetic tree was built using FastTree (v2.1)57 with default parameters.
Protein Expression and Purification. The gene encoding for a putative soil viral chitosanase sequence (Ga0126380_1000012531: noted with a double asterisk in Fig. 1) was chemically synthesized and inserted into the NdeI site of pET28a inclusive of a 20-residue extension at the N-terminus (MGSSHHHHHHSSGLVPRGSH-) containing a poly-histidine metal affinity tag (bold) and thrombin protease cleavage site (underlined) in the primary amino acid sequence of the expressed protein. The recombinant plasmid was used to transform chemically competent Escherichia coli BL21(DE3) (Invitrogen, Carlsbad, CA) from which ~1 mL ~15% glycerol stocks (LB media, OD600nm = ~0.8) were prepared from a single colony and frozen (-80 °C) for future use. This glycerol stock was used to seed 25 mL of LB medium that was grown to an OD600nm of ~ 0.8 and then transferred to 750 mL of autoinduction LB medium58 (2 L flasks, 200 rpm shaker, 0.34 ug/uL kanamycin, 37 °C). Upon reaching an OD600nm of approximately 1, the temperature was lowered to 30 °C. The cells were harvested ~16 h later (next day) by gentle centrifugation and then frozen (-80 °C). Cells were lysed by thawing the frozen pellet followed by sonication (~ 1 min) before and after three passes through a French Press (SLM Aminco, Rochester, NY). Following centrifugation, the protein in the soluble fraction was purified using a conventional two-step purification protocol: metal chelate affinity chromatography on a 20 mL Ni-Agarose 6 FastFlow column (GE Healthcare, Piscataway, NJ) followed by gel-filtration chromatography on a Superdex HiLoad 26/60 column (GE Healthcare, Piscataway, NJ)59. Fractions containing the target protein after the last column step were concentrated to 2 – 5 mg/mL (Protein Buffer: 100 mM NaCl, 20 mM Tris, 1 mM DTT, pH 7) and stored at 4 °C until used for crystallization or enzyme assays. Yields of 2 – 4 mg purified protein were obtained per liter LB medium. The same protocol was applied to prepare two modified proteins each containing the point substitution D148N or E157Q. Mutagenesis was performed as previously described 60.
Chitosanase activity assays. Wildtype V-Csn and the two modified proteins were tested for endo-chitosanase activity using an azurine cross-linked (AZCL) chitosan substrate (AZCL-chitosan; Megazyme, Wicklow, Ireland)61. Stock solutions (1200 μg/mL) of each protein were prepared in Protein Buffer along with AZCL-chitosan suspensions (250 μg/mL) at pH 4.3, 5.1, and 6.5 in 40 mM sodium acetate, 100 mM NaCl, 1 mM DTT. The reactions were performed in triplicate, at room temperature, by adding 17 μL of protein (20 μg) to 100 μL of AZCL-chitosan in a 500 μL Eppendorf tube. The tubes were agitated by rotation (40 rpm) in a Multi-Purpose Tube Rotator (Fisher Scientific). Activity was monitored by pelleting the substrate with brief centrifugation and measuring the absorbance of released azurine-linked product at 590 nm (NanoDrop 2000c; Thermo Scientific) using a 2 μL aliquot. Blank reactions showed no release of azurine-linked product in the absence of protein and pH measurements before and after the reaction varied less than 0.1 pH unit.
Crystallization, X-ray data collection and processing. Initial crystallization conditions for V-Csn were obtained using the hanging drop method employing the Top96 screen (Anatrace). Crystals were observed in multiple conditions. Crystals from several conditions were harvested and flash-cooled in liquid nitrogen in their respective crystallization conditions augmented with 20% ethylene glycol. The crystals were sent to SSRL for diffraction screening on beamline BL9-2. Three conditions gave crystals which diffracted to high resolution; condition #45 (0.2 M ammonium sulfate, 0.1 M sodium acetate pH 4.6, 30% MMePEG2000) in space group C2 with unit cell dimensions a=108.84 Å, b=47.63 Å, c=45.55 Å, β=97.8º, with one monomer in the asymmetric unit (AU); condition #38 (0.1 M citrate pH 5.5, 20% PEG3000) in space group C2 with unit cell dimensions a=163.30 Å, b=46.00 Å, c=73.56 Å, β=92.3º, with two monomers in the AU; and condition #20 (0.2 M ammonium sulfate, 0.1 M bis-tris pH 5.5, 25% PEG3350) in space group C2 with unit cell dimensions a=80.47 Å, b=35.76 Å, c=80.66 Å, β=118.5º, with one monomer in the AU.
Data sets were collected from single crystals in conditions #45 and #38. For the condition #45 crystal (designated apo1), 1800 0.2º images were collected on BL12-2 using X-rays at 17000 eV (0.72929 Å) and a Pilatus 6M PAD detector running in shutterless mode. The images were processed with XDS62 and scaled using AIMLESS63. The final data set comprised 174574 unique reflections to 0.89 Å resolution. For the condition #38 crystal (apo2), 1800 0.2º images were collected on BL9-2 using X-rays at 12658 eV (0.97946 Å) and a Pilatus 6M PAD detector running in shutterless mode. The images were processed with XDS62 and scaled using AIMLESS63, and the final data set comprised 117982 unique reflections to 1.35 Å resolution. Additional data collection and processing statistics for both crystal forms are given in Extended Data Table 1.
For experimental phasing, a KBr soaking solution was prepared by dissolving solid KBr in condition #45 crystallization buffer augmented with 25% glycerol until a saturated solution was obtained (as determined visually under a microscope). This solution was diluted with fresh buffer to form a 1/8 saturated crystal soaking solution. Several apo1 crystals were swished quickly in this solution and flash-cooled in liquid nitrogen. Diffraction data sets were collected from KBr-soaked apo1 crystals on beamline BL12-2 at the bromide edge (13481 eV, 0.91967 Å). A total of 3600 images were collected with a rotation angle of 0.2º/image, using the inverse beam method and 20º wedges. The images were processed with XDS62 and scaled using AIMLESS63. Additional statistics are given in Extended Data Table 1. Initial analysis of the data indicated a strong anomalous signal from the bromide extending to approximately 1.7 Å resolution.
Structure determination and refinement. The V-Csn structure was solved by Br-SAD (bromide single anomalous diffraction) methods implemented in PHENIX17. Following solvent flattening and density modification, the overall figure of merit (FOM) was 0.363 for 16 bromide sites. Autobuilding in PHENIX generated a model comprising 221 out of 224 expected residues. Initial refinement with phenix.refine19 gave an Rwork and Rfree of 0.158 and 0.187, respectively. The model was completed using COOT18 and refined further with phenix.refine using the apo1 data to 0.89 Å resolution. Water molecules were added at structurally and chemically relevant positions, and the atomic displacement parameters for all atoms in the structure were refined isotropically. The apo2 structure was solved by molecular replacement using the program MOLREP64 from the CCP4 suite65, using the refined apo1 structure as the search model. Final refinement statistics for the two apo-V-Csn structures are given in Extended Data Table 2.
Chitosanase mutant and substrate structures. V-Csn mutants D148N and E157Q were screened for crystallization using conditions #20, #38 and #45, and crystals were observed in all three. Diffraction data sets were collected from single D148N and E157Q crystals from condition #45. For the D148N crystals, 1800 images (0.2º rotation/image) were collected on BL12-2, and the data processed and scaled with XDS62 and AIMLESS63. For the E157Q crystal, 1850 images were collected on BL12-2, and the data processed and scaled with XDS62 and AIMLESS63. Data collection statistics are given in Extended Data Table 3. Both structures were solved by molecular replacement with MOLREP64 using the refined wild-type V-Csn structure as the starting model, with all water molecules removed. The D148N and E157Q structures were refined with phenix.refine19, and final statistics are also given in Extended Data Table 4.
The E157Q-substrate complex was prepared by dissolving 0.06 mg of chitohexaose (Biosynth) in 10 uL of E157Q at 3.3 mg/ml, giving a final chitohexaose concentration of around 5 mM. The complex was incubated at 4 ºC for 1 h prior to setting up sitting drops against crystallization condition #45. The crystallization drops were streak-seeded several hours after setup and crystals of the complex were observed in all drops overnight. The crystals were morphologically similar to wild-type and mutant crystals grown under the same conditions. The crystals were transferred into crystallization buffer augmented with 25% glycerol, and flash-cooled in liquid nitrogen. Diffraction data were collected at BL12-2. A total of 1800 images were collected, and the data processed and scaled with XDS62 and AIMLESS63. The E157Q-substrate complex structure was solved by molecular replacement with MOLREP64 using the refined wild-type V-Csn structure with all water molecules removed as the starting model, and refined with phenix.refine19. Data collection and refinement statistics are given in Extended Data Tables 3 and 4.
Structure modeling by AlphaFold2. The AlphaFold2 structure predictions were run using either a locally-installed version of the software retrieved from the official GitHub repository (https://github.com/deepmind/alphafold) or the Google collaborative AlphaFold2 notebook (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). Solvent accessible surfaces were calculated with PyMOL (v2.5.2) (Schrodinger) and ICM-Pro (v3.8-6a) (Molsoft), using a probe radius of 1.4 Å (equivalent to the radius of a single water molecule). The electrostatic surfaces were generated with the Adaptive Poisson-Boltzmann Solver (APBS) plugin for PyMOL (v2.5.2).
Data availability. The atomic coordinates and structure factors for the protein structures have been submitted to the Protein Data Bank as follows: V-Csn apo1, PDB code 7TVL; V-Csn apo2, 7TVM; V-Csn-D148N, 7TVN; V-Csn-E157Q, 7TVO; V-Csn-E157Q chitohexaose complex, 7TVP. The wwPDB X-ray structure validation reports are included in Supplementary Information Figure 3.
Code availability. No custom code or custom mathematical algorithms were applied to this study.
41. Roux, S. et al. IMG/VR v3: An integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucl. Acids Res. 49, D764–D775 (2021).
42. Paez-Espino, D., Pavlopoulos, G. A., Ivanova, N. N. & Kyrpides, N. C. Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data. Nat. Protoc. 12, 1673-1682 (2017).
43. Huerta-Cepas, J. et al. EggNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucl. Acids Res. 44, D286–D293 (2016).
44. Cantarel, B. L. et al. The Carbohydrate-Active EnZymes database (CAZy): An expert resource for glycogenomics. Nucl. Acids Res. 37, D233-D238 (2009).
45. Prestat, E. et al. FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus. Nucl. Acids Res. 42, e145 (2014).
46. Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, 1002195 (2011).
47. Johnson, M. et al. NCBI BLAST: A better web interface. Nucl. Acids Res. 36, W5–W9 (2008).
48. Hyatt, D. et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).
49. Wu, R. et al. Moisture modulates soil reservoirs of active DNA and RNA viruses. Commun. Biol. 4, 992 (2021).
50. Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578-585 (2021).
51. Bolduc, B. et al. vConTACT: An iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).
52. Galiez, C., Siebert, M., Enault, F., Vincent, J. & Söding, J. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 33, 3113-3114 (2017).
53. Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A. & Sun, F. Alignment-free d2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucl. Acids Res. 45, 39-53 (2017).
54. Lu, C. et al. Prokaryotic virus host predictor: A Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol. 19, 5 (2021).
55. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
56. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772-780 (2013).
57. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. PloS One 5, e9490 (2010).
58. Studier, F. W. Protein production by auto-induction in high-density shaking cultures. Prot. Expr. Purif. 41, 207-234 (2005).
59. Buchko, G. W., Clifton, M. C., Wallace, E. G., Atkins, K. A. & Myler, P. J. Backbone chemical shift assignments and secondary structure analysis of the U1 protein from the Bas-Congo virus. Biomol. NMR Assign. 11, 51–56 (2017).
60. Wrenbeck, E. E. et al. Plasmid-based one-pot saturation mutagenesis. Nature Methods 13, 928–930 (2016).
61. Schönbichler, A., Díaz-Moreno, S. M., Srivastava, V. & McKee, L. S. Exploring the Potential for Fungal Antagonism and Cell Wall Attack by Bacillus subtilis natto. Front. Microbiol. 11, 521 (2020).
62. Kabsch, W. XDS. Acta Crystallogr. D66, 125-132 (2010).
63. Evans, P. R. & Murshudov, G. N. How good are my data and what is the resolution? Acta Crystallogr. D69, 1204–1214 (2013).
64. Vagin, A. & Teplyakov, A. MOLREP: An automated program for molecular replacement. J. Appl. Cryst. 30, 1022-1025 (1997).
65. Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Cryst. D67, 235-242 (2011).
66. Weiss MS. 2001. Glocal indicators of X-ray data quality. J Appl Crystallogr 34:130-135.
67. Karplus PA, Diederichs K. 2012. Linking crystallographic model and data quality. Science 336:1030-1033.
68. Evans PR. 2006. Scaling and assessment of data quality. Acta Crystallogr D62:72-82.
69. Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. 2010. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr D66:12-21.