The Mimivirus 1.2 Mb dsDNA genome is elegantly organized into a nuclear-like weapon

Chantal Abergel (  chantal.abergel@igs.cnrs-mrs.fr ) French National Centre for Scienti c Research https://orcid.org/0000-0003-1875-4049 Alejandro Villalta Casares French National Centre for Scienti c Research https://orcid.org/0000-0002-7857-7067 Emmanuelle Quemin University of Hamburg Alain Schmitt French National Centre for Scienti c Research Jean-Marie Alempic French National Centre for Scienti c Research Audrey Lartigue French National Centre for Scienti c Research Vojta Prazak University of Oxford Daven Vasishtan Oxford Agathe Colmant French National Centre for Scienti c Research Flora Honore French National Centre for Scienti c Research https://orcid.org/0000-0002-0390-8730 Yohann Coute University Grenoble Alpes, CEA https://orcid.org/0000-0003-3896-6196 Kay Gruenewald University of Oxford https://orcid.org/0000-0002-4788-2691 Lucid Belmudes Univ. Grenoble Alpes, CEA, INSERM, IRIG, BGE

while protecting the viral genome, in a state ready for immediate transcription upon unwinding in the host cytoplasm. We expect that a dedicated energy-driven machinery is required for the assembly of this rod-shaped giant viral chromosome and its further compaction in the membrane limited electron dense nucleoid, characteristic of the mature Mimivirus particles 2,4,5 .The parsimonious implication of the same protein in two functionally unrelated substructures of the virion is also unexpected for a giant virus with a thousand genes at its disposal.

Main
Giant viruses were discovered with the isolation of Mimivirus from Acanthamoeba species 1,6 .
These viruses now represent a highly diverse group of dsDNA viruses infecting unicellular eukaryotes 7 which play important role in the environment 8 . They also challenge the canonical definitions of viruses 9,10 as they can encode central translation components as well as a complete glycosylation machinery 11 among other unique features.
Mimivirus has been the most extensively studied giant virus over the years 12 16 . Inside the capsids, a lipidic membrane delineates an internal compartment (~350 nm in diameter), we refer to as the nucleoid, containing the viral genome, together with all the proteins necessary to initiate the replicative cycle within the host cytoplasm [17][18][19] .
Acanthamoeba cells engulf Mimivirus particles, fooled by their bacteria-like size and the sweet taste of the heavily glycosylated decorating fibrils 1,6 . Once in the vacuole, the Stargate portal located at one specific vertex of the icosahedron opens up 20 , enabling the viral membrane to fuse with that of the phagosome to deliver the nucleoid into the host cytoplasm 17,19 . Subsequently, the nucleoid starts losing its electron dense appearance, transcription begins and the early viral factory is formed 18,21,22 . Previous AFM studies of Mimivirus infectious cycle evidenced a highly condensed nucleoprotein mass enclosing the organized DNA into the nucleoid 2 . We have developed an in vitro protocol for particle opening that leads to the release of ~ 30 nm-wide rod-shaped fibres of micrometric lengths.
The structure appears to be expelled from the icosahedral capsids, similarly to party whistle, as if packed under high pressure (Fig. 1). in the first and fifth slices; the helically folded DNA in the second and fourth slices; the lumen of the fibre in the third slice. Thickness of the slices is 1.1 nm. Distance between the tomographic slices is 4.4 nm between first and second and fourth and fifth slices, and 6.6 nm between the second, third and fourth slices. Multiple states of unwinding can be observed in D, E and F. Scale bars 100 nm.
Various unwinding states of the fibre were observed, ranging from the most compact rodshaped structures to unfolded ribbons (Fig. 1). After optimizing the extraction on different strains of group-A Mimiviruses, we focused on an isolate from La Réunion Island (Mimivirus reunion) with high yields of fibres that were purified on a sucrose gradient (Extended data Methods). We first confirmed the presence of DNA in the structure by electrophoresis on agarose gels (Extended data Fig. 1) and performed cryo-electron microscopy (cryo-EM) to prospect its location by bubblegram acquisitions 23,24 . Surprisingly, the sample could sustain very high electron doses of at least 600 e-/A² for unwinding helices and up to 825 e-/Å² for long compact ones before bubbles appear as a sign of hydrogen gas trapped inside the DNAprotein complex upon protein damage 25 (Extended data Fig. 2). No bubbles could be detected in unfolded ribbons suggesting that in addition to compact the Mimivirus genome, the rodshaped structure could also protect it inside the proteinaceous shell. Assuming that each turn of DNA in the B-form helix progresses of ~34 Å for 10 bp, the Mimivirus 1.2x10 6 bp linear genome would extend over ~400 µm corresponding to a volume of 1.3x10 6 nm 3 (~300 µm for ~1x10 6 nm 3 if in the A-form) 26 . The spherical nucleoid (~ 350 nm in diameter) corresponds to an approximate volume of 2.2x10 7 nm 3 that can accommodate over a dozen genomes in its linear, unprotected form. However, it can only hold ~30 µm of a 30-nm-wide flexible cylindrical genomic fibre, which directly implies that the Mimivirus genome cannot be simply fitted linearly in the centre of the fibre, but must be folded back into the structure. Several DNA compaction solutions to this geometrical problem have been described. For instance, the DNA of filamentous viruses infecting archaea 27,28 is wrapped with proteins to form a ribbon which in turn folds into a helical rod around a hollow lumen. In contrast, the chromatin of cellular eukaryotes consists of DNA wrapped around histone complexes 29 . The later organization would be consistent with previous evolutionary hypotheses linking giant DNA viruses with the emergence of the eukaryotic nucleus [30][31][32][33] . In order to shed light on Mimivirus packaging strategy, we performed a structural analysis of the purified genomic fibre using cryo-EM single-particle analysis and sub-volume averaging. The different stages of decompaction initially observed by negative staining and cryo-EM ( Fig. 1) resulted in a highly heterogeneous dataset for single particle analysis. After pre-processing and manual picking, 2D classes obtained in Relion 34  The least populated intermediate conformation (Fig. 2, Cl2) was not analysed further and we focused on the most (Cl1) and the less compact (Cl3) datasets for independent 3D reconstructions. After 3D classification, we identified two independent structures with approximatively the same width, both made of a ~8 nm-thick proteinaceous external shell, for each cluster (Cl1, Cl3). The most compact conformation (Cl1) converged to two ~29 nm lefthanded helices in which the dsDNA strands are internally lining the protein shell and leave a hollow channel. The most abundant 3D class (13,252 particles) corresponds to a 5-start helix with 5 dsDNA strands following the helical symmetry, located 6.6 Å away from the protein shell, with a ~9 nm wide hollow channel. The second 3D class (12,294 particles) is a 6-start helix with 6 dsDNA strands lining the proteinaceous shell (5.8 Å spacing) and a ~11 nm hollow channel. In Cl3, the least compact form, the two structures correspond to a ~33.2 nmwide left-handed helices with either a 5-or a 6-start (5,036 and 6,320 particles respectively), in which the DNA is no longer visible inside the ~17 nm-wide central channel (Fig. 2, Fig. 3, Table 1). The map (solid) and cartoon representation of the qu_946 dimers are colour coded according to each start of the 5-start helix. To facilitate the visualisation, the protein shell is presented alone in the front view of the fibre, while it is embedded into the map in the back, expanding the size of the fibre. B] Orthographic view of the sliced structure to visualize the 5 DNA strands lining the protein shell. C] Map of the best 3D reconstuction obtained after refinement in Relion 34,35 . D] Orthogonal view of the clipped map (transparent) in which was fitted the qu_946 model (solid) and the 5 DNA strands. E] Final cryo-EM map coloured by local resolution from 4.3 Å (blue) to 13.19 Å (red) with Euler angle distribution of the particles used for the 3D reconstruction in red. The Orthogonal view of the colour coded map is at the bottom. The resolution ranges between 4.3 and 8Å for the protein shell and ~9Å for the DNA. Images were produced using ChimeraX 36 , the length of the fibre was defined to present the 5 starts of the helix.
Mass spectrometry-based proteomic analyses identified the homologs of two GMCoxydoreductases qu_143 and qu_946 (R135 and L894/93 in Mimivirus) as the main components of the purified genomic fibre (Extended data Table 2). This was confirmed by successfully fitting the Mimivirus R135 GMC-oxydoreductase dimeric structure 3 (PDB 4Z24, lacking the 50 amino acid long cysteine-rich N-terminal domain) into the ~8 nm external proteinaceous shell after 3D refinement of the well resolved 5-start structure from Cl1 (Fig. 3  & 4). Quite unexpectedly, GMC-oxydoreductases were already known to compose the fibrils surrounding Mimivirus capsids 37 . Interestingly, the proteomic analyses measured different sequence coverages for the GMC-oxydoreducatases depending whether they were associated to the peripheral fibrils or the genomic fibre, with the striking absence of the N-terminal domain in the genomic fibre (Extended data Fig. 6). There is uninterpreted electronic density in the compact 5-start form in which 5 additional N-terminal residues of the qu_946 sequence could be built for each monomer, compared to the reference X-ray structure. This strikingly brings the cysteines of each monomer (C51 in qu_946) close enough to make a disulphide bridge, directly after the 50 amino acid domain non covered in the proteomic analysis of the genomic fibre (Fig. 4, Extended data Fig. 6 & 8). Inspection of individual fibres in the tomograms and sub-volume averaging confirmed the co-existence of 5-and 6-start helices containing DNA while some intermediates in structure decompaction were also observed. In unwinding or broken fibres, masses that could correspond to proteins inside the lumen were sometime visible as well as directly dissociated DNA fragments, either in the central channel or at the breakage points of the fibres in its periphery. Densities corresponding to the dimer composing the protein shell were also commonly observed on free DNA strands (Extended data. Fig. 7).
In the 3D reconstructions, each dimer is in contact with two independent dsDNA strands through each monomer. As a result, the DNA is interspersed between two dimers, each corresponding to one start of the helix. In the 5-start structure, these contacts involve one aspartate (D82 relative to the N-terminal Methionine in qu_946), one glutamate (E321), three lysines (K84, K344, K685), one arginine (R324) and a histidine (H343) (Fig. 4). The resolution (up to 4.3 Å for the protein shell, Fig. 3E) did not allow the identification of potential ions bridging the DNA phosphates to the protein. We postulated that the distance between the dsDNA strands is defined by each GMCoxydoreductase with different residues being exposed to interact with the dsDNA. On one hand, while there are no clashes between the dimers in the 5-start 3D reconstruction with either GMC oxydoreductases, there is uninterpreted electon density using the qu_143 protein model uniquely lacking 12 C-terminal residues compared to others (Extended data Fig. 8). On the other hand, when positioning the two GMC oxydoreductases into the best 6-start 3D reconstruction it appeared that only qu_143 could be fitted into the map due to steric hindrance from the longer C-terminal tail in qu_946. We also noticed that in the 6 start helix part of the qu_143 model needed to be reoriented to avoid clashes with adjacent molecules.
Even though we did not attempt to optimize the model due to the lack of resolution of the 6 start helix map (6 to 8 Å resolution) (Fig. 2), we were able to compare the spacing between the dsDNA strands (39.5 Å in the 5-start, 36.4 Å in the 6-start) and identified two lysine residue (K319, K342), an arginine (R322) and a glutamine (Q649) in the 6-start qu_143 structure which would be ideally positioned to interact with the DNA. In both 5-and 6-start helices, the genome would need to be further compacted in A-form DNA to fit the genomic fibre into the ~350 nm wide nucleoid. This is consistent with our 3D reconstructions, as only a compact A-form dsDNA would be properly positioned to produce the observed periodic contacts between the protein shell and the dsDNA all along the fibre (Fig. 4A). Thus despite the conformational heterogeneity and the flexibility of the rod-shaped structure we were able to compute an atomic resolution model of the Mimivirus genomic fibre within a map with a 4.3 Å maximum resolution (Fig. 3).
The proteomic analysis revealed the presence of additional proteins including several RNA polymerase subunits: Rpb1 and Rpb2 (qu_530/532 and qu_261/259/257/255), Rpb3/11 (qu_493), Rpb5 (qu_245), and Rpb6 (qu_220), in addition to the mRNA capping enzyme (qu_404), a putative regulator of chromosome condensation (qu_366), the core protein (qu_431), several oxidative stress proteins, a 93 amino-acid-long basic protein of unknown function (qu_544) and additional hypothetical proteins (Extended data Table 2). Upon inspection of the negative staining micrographs, macromolecules strikingly resembling the characteristic groovy structure of the Poxviruses RNA polymerase 40 were frequently noticed scattered around the unwinding fibre, sometimes sitting on DNA (Extended data Fig. 9). The hollow lumen of the fibre is large enough to accommodate the Mimivirus RNA-polymerase, most likely sitting on the highly conserved promoter sequence of early genes 41  The structure of Mimivirus genomic fibre supports a complex assembly process where the DNA must first be folded on itself prior being packaged, a step which could involve the repeat containing regulator of chromosome condensation (qu_366). As discussed earlier, the composition of the proteinaceous shell, through different contacting residues between the dsDNA phosphates and the two GMC-oxydoreductases, determines the number of dsDNA strands folded into the structure before loading into the nucleoid. Then the RNA polymerase could define the helix diameter through its anchoring onto the dsDNA. Additional proteins such as the 10 kDa basic qu_544 protein with a central hydrophobic segment could serve as linkers between the genome and the protein shell. Finally, in addition to their structural roles, the two GMC-oxydoreductases making the proteinaceous shield, could, together with the other oxidative stress proteins identified in the fibre, alleviate the oxidative stress to which the virions are exposed while entering the cell by phagocytosis. The apparent complexity of the flexible genomic fibre, with various unwinding states and two different helical symmetries, requires additional work to stabilize these complexes in order to identify the contributions of the various proteins composing the fibre and to progress towards 5-and 6-start higher resolution structures. Alternatively, we could perform the structural study of the genomic fibre of another member of the family encoding a unique GC-oxydoreductase.
The genomic fibre is additionally folded into the membrane limited nucleoid 2 and since it is expelled from the nucleoid as a flexible straight structure, we suspect that an active, energy dependent process is required to bundle it into the nucleoid. To our knowledge, the structure of the genomic fibre used by Mimivirus to packaged and protect its genome is unique in the dsDNA viral world and represents the first description of the genome organization of a giant virus. While herpesviruses 45,46 , bacteriophages 47,48 or some archaeal viruses 49

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryInformationFinal.pdf