Structure of an endogenous mycobacterial MCE lipid transporter

To replicate inside human macrophages and cause the disease tuberculosis, Mycobacterium tuberculosis (Mtb) must scavenge a variety of nutrients from the host1,2. The Mammalian Cell Entry (MCE) proteins are important virulence factors in Mtb1,3, where they are encoded in large gene clusters and have been implicated in the transport of fatty acids4–7 and cholesterol1,4,8 across the impermeable mycobacterial cell envelope. Very little is known about how cargos are transported across this barrier, and how the ~10 proteins encoded in a mycobacterial mce gene cluster might assemble to transport cargo across the cell envelope remains unknown. Here we report the cryo-EM structure of the endogenous Mce1 fatty acid import machine from Mycobacterium smegmatis, a non-pathogenic relative of Mtb. The structure reveals how the proteins of the Mce1 system assemble to form an elongated ABC transporter complex, long enough to span the cell envelope. The Mce1 complex is dominated by a curved, needle-like domain that appears to be unrelated to previously described protein structures, and creates a protected hydrophobic pathway for lipid transport across the periplasm. Unexpectedly, our structural data revealed the presence of a previously unknown subunit of the Mce1 complex, which we identified using a combination of cryo-EM and AlphaFold2, and name LucB. Our data lead to a structural model for Mce1-mediated fatty acid import across the mycobacterial cell envelope.


Introduction
Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis, is one of the leading causes of death due to infectious disease, resulting in over one million deaths annually 9 . Mtb establishes a niche within the phagosomal compartment of host macrophages, where it can grow and replicate. To survive in the phagosome, Mtb must scavenge nutrients from the host cell 1,2 , and utilizes an ensemble of active transporters to import iron 10,11 , lipids 1,2 , and other metabolites 12 . In particular, the Mammalian Cell Entry (MCE) family of proteins has been implicated in the import of substrates such as fatty acids [4][5][6][7] and cholesterol 1,4,8 across the cell envelope of Mtb and related species such as Mycobacterium smegmatis (Msmeg) (Fig. 1a) 3,13,14 . MCE proteins are critical for virulence in Mtb and other bacterial pathogens 1,3,[15][16][17][18][19] , underscoring their fundamental importance for nutrient acquisition from the host. To mediate the uptake of fatty acids and cholesterol, MCE transporters must translocate substrates across the impenetrable cell envelope, which consists of: 1) the inner membrane (IM), 2) the complex mycobacterial outer membrane (MOM), and 3) a periplasmic space between the IM and MOM, containing the cell wall 20 . In Gram-negative bacteria, many cargos are transported via large transenvelope protein-based machines that mediate the passage of substrates across membranes and the periplasmic space, such as the LPS export system [21][22][23][24][25][26] , antibiotic e ux pumps 27,28 , and a variety of specialized protein secretion systems 29 . In contrast, it is unclear how substrates are transported across the highly divergent mycobacterial cell envelope, whether such periplasm-spanning complexes exist, and how active transporters such as the MCE transporters may facilitate substrate transport in mycobacteria. across the periplasm. However, the elongated tunnel of Mce1 is structurally divergent from proteins characterized to date (Extended Data Fig. 5a), and to our knowledge in the rst structure of such a periplasm-spanning transport system in mycobacteria (Extended Data Fig. 5b).
The portal creates an entrance to the transport pathway Substrates for import from the MOM may enter the Mce1 complex through the portal domain (Fig. 3a), which is composed of a small six-stranded β-barrel (Fig. 3b) surrounded by non-canonically structured regions (Extended Data Fig. 6a,b). Apart from the β-barrel motif, the portal domain has no apparent homology to any known protein domains. The C-terminus of each MCE protein contributes a single β-strand to the formation of the β-barrel, and also provides a portion of the surrounding non-canonical regions. Despite being formed from six homologous MCE proteins (Mce1A-Mce1F), the Cterminal regions of each MCE subunit are structurally distinct and vary widely in length (Extended Data Fig. 6a,b). The lumen of the β-barrel is aligned with the tunnel and has a hydrophobic interior, potentially acting as an entry point for substrates (Fig. 3c). While this β-barrel is formed from just 6 strands, the high tilt of its β-strands results in a barrel diameter similar to the 8-stranded fatty acid binding phospholipase PagP found in the E. coli outer membrane 39 . In our structure, passage through the β-barrel is blocked by a few loosely packed hydrophobic side chains that protrude into the lumen. If and how opening may occur is unclear, but relatively subtle side chain rearrangements may be su cient to open a pore large enough for a fatty acid to thread through.
The needle forms a unique tunnel assembly to facilitate transport of substrates The portal feeds directly into a tunnel created by the needle, a unique -helical structure that is strikingly curved. Our EM data for Mce1 suggest that the curved needle is fairly rigid, and we do not observe straight or alternatively-curved states. The needle curvature likely arises from the asymmetric, heterohexameric assembly of the MCE proteins, but its functional role is not immediately clear. Each MCE subunit contains eight copies of a helical repeat motif, separated by well-de ned kinks ( Fig. 3d, Extended Data Fig. 6a). The helical segments from Mce1ABCDEF twist around each other to form a left-handed superhelix with a pitch of ~75 Å and almost exactly two complete turns (Fig. 2d). The rst helical repeats from each MCE subunit associate to form a 6-helix bundle. Similarly, repeats 2, 3, 4, 5, 6, 7, and 8 associate to form separate 6-helix bundles, for a total of eight structurally similar modules (Extended Data Fig. 6c). These eight modules stack on top of each other to make a long, needle-like tube, and are connected by short linkers (Fig. 3d). The 6-helix bundles appear to be unrelated to previously described folds, such as 6-helix coiled-coils 40 .
The inside of the needle contains a long tunnel, ~7,000 Å 3 in volume, with an inner diameter ranging from 7-11 Å. The tunnel is lined with hydrophobic residues, potentially providing a sheltered passageway for fatty acids to cross the periplasm (Fig.   3e, Extended Data Fig. 6d). Numerous strong densities are present in the needle, which may correspond to bound substrates (Fig. 3f). The resolution of these densities is too low to unambiguously identify the ligand, but the size and shape are consistent with fatty acid chains that range from 10 to 49 carbons in length (Extended Data Fig. 4b). In many places, 3-5 fatty acid-like densities appear to run parallel to each other along the long axis of the needle, suggesting that multiple substrates may be transported "in bulk" through the tunnel. One of the largest and most prominent densities is located in the needle just below the portal domain, where a loop from Mce1E protrudes into the lumen and partially occludes the otherwise broad and featureless tunnel (Figs. 3b,c). The constriction in the tunnel formed by this loop may create a fatty acid binding site reminiscent of the high a nity site in the long-chain fatty acid transporter, FadL 41 . In our structure, strong density for a possible mycolic acid substrate (49-carbons) lls the area surrounding this loop (Fig. 3c), consistent with a possible role of Mce1 in mycolic acid recycling and MOM maintenance 7 . This binding site, just beyond the β-barrel entrance, may be involved in substrate selection, occurring prior to transport through the tunnel.

MCE ring connects needle to an ABC transporter
The hydrophobic tunnel through the needle leads to a pore through the ring, which is formed by six MCE domains (Fig. 4a). Each MCE domain in the ring is structurally similar (Extended Data Fig. 7a) but the domains are only ~17% identical to one another at the sequence level (Extended Data Fig. 7b), leading to a heterohexameric ring with the following arrangement: (Fig. 4b). This contrasts with the rings observed in other MCE protein assemblies, including LetB, PqiB, and MlaD, which are homohexameric and approximately six-fold symmetric 42,43 . The pore of the Mce1 ring is formed by a pore-lining loop (PLL) from each MCE domain (Fig. 4b, Extended Data Fig. 7c). The arrangement of the PLLs may form a gate between the periplasmic needle assembly and the substrate-binding pocket of the ABC transporter below (Fig. 4c). In our structure, the pore through the ring is closed, and a conformational change is likely required to allow passage of substrates into the ABC transporter. Opening and closing of the tunnel through MCE rings has been observed previously in LetB and PqiB 42,43 , and may also occur in the Mce1 ring.
ABC transporter in the inner membrane is poised to accept substrates from MCE ring The pore through the MCE ring leads to the ABC transporter in the IM, which consists of a heterodimer of permease proteins, YrbE1A and YrbE1B and a homodimer of the ATPase MceG (Fig. 4a) Fig. 8d). This extension may stabilize the tilted state, possibly playing a role in coupling conformational changes in the ABC transporter to MCE ring opening/closing. In contrast to the homodimer found in most bacterial ABC transporters, the YrbE1AB heterodimer could facilitate the recognition of asymmetric substrates 47 .
In our structure, YrbE1AB adopts an outward-open state, with a narrow substrate-binding pocket of ~150 Å 3 that is formed between the YrbE subunits (Figs. 2d,4c). Density for an elongated ligand, resembling a fatty acid, is observed extending upwards from the substrate binding pocket (Fig. 4c). An MceG ATPase is bound to each YrbE subunit, forming a homodimer ( Fig. 4e). Each MceG contains a ~120 amino acid C-terminal extension that is much longer than canonical ABC transporters.
This extension consists of several α-helices connected by exible linkers that interact with the neighboring MceG subunit (Fig. 4e). Cholesterol growth assays with MceG mutants demonstrate that the C-terminal extension and its interaction with the neighboring subunit is important for function (Fig. 4f) Table 4). The additional subunit, found only in Class 1, lies almost entirely within the transmembrane region, and consists of 4 TM helices (Fig. 5b). Examination of our MceG-GFP mass spectrometry data did not suggest an obvious candidate protein consistent with our EM density (Supplementary Tables 2,3). To identify this unknown subunit, we built a polyalanine model into the density and used these coordinates to do a structure-based search of the Protein Data Bank and AlphaFold Protein Structure Database 56 using Foldseek (Fig. 5b) 57 . While no proteins with similar structure were identi ed in the Protein Data Bank, the search of the AlphaFold database revealed predicted structures that matched our polyalanine model well, including MSMEG_3032 and its Mtb homolog Rv2536 58 (~61% identical) (Fig. 5c). Fitting the AlphaFold2 MSMEG_3032 model into our EM density required minimal adjustment apart from a few sidechain rotamer changes, supporting the assignment of MSMEG_3032/Rv2536 as a novel component of the Mce1 system (Fig. 5d, Extended Data Fig. 4c). Based upon a possible role as a Lipid Uptake Coordinator, analogous to the proposed role of LucA 4 , we rename MSMEG_3032/Rv2536 to LucB. To validate the interaction identi ed from our structure, we assessed whether LucB pulled down MCE transporter components. We constructed an Msmeg strain with chromosomally tagged LucB-GFP, and puri ed the protein by anti-GFP a nity and size exclusion chromatography (Extended Data Figs. 9a,b). Negative stain electron microscopy of the resulting sample reveals particles with characteristic shape and features of the LucB, for which there is a single paralog in Msmeg and Mtb, is a protein of unknown function and has not previously been linked to MCE transporters. Orthologs of this protein can be found in bacteria of the Actinomycetales order, particularly in the families: Gordoniaceae, Mycobacteriaceae, Nocardiaceae, Pseudonocardiaceae, and Tsukamurellaceae (Extended Data Fig. 10b). Interestingly, LucB orthologs appear to be found only in double-membraned bacteria containing Mtb-like mce operons 8 , with a conserved 8 gene cluster encoding two distinct YrbE and six distinct MCE proteins. Conversely, orthologs of LucB are not found in genomes that encode simpler MCE gene clusters encoding single YrbE and MCE proteins subunits, such as those found in E. coli. This observation, coupled with our data, suggests that LucB may have evolved to function speci cally with heterooligomeric MCE transporters that arose in the actinobacterial lineage, and may be involved in the regulation of activity in these transporters.

Discussion
The mycobacterial cell envelope is highly complex and divergent from its Gram-negative counterparts. Mechanisms for how substrates are transported across the mycobacterial cell envelope have remained elusive. Our high-resolution structure of an endogenous Mce1 transport complex allows us to propose a model for how this important virulence factor may work to import substrates (Fig. 6, Supplementary Video 2). First, fatty acids or mycolic acids from the MOM may enter through the βbarrel of the portal domain, either directly or mediated by additional unknown factors in the MOM. How the Mce1 complex recognizes speci c substrates is unclear, but one possibility is that substrate selection occurs at the apparent fatty acid binding site noted just below the β-barrel of the portal domain. After entering the complex, the substrates travel across the periplasm through the hydrophobic tunnel created by the curved Mce1ABCDEF needle, in which several substrates may be accommodated simultaneously. At the base of this needle, the ring of MCE domains must undergo a conformational change, opening the central pore to allow substrate entry into the IM ABC transporter. ATP hydrolysis by MceG likely drives conformational changes in the YrbE1AB subunits to translocate substrates into the cytoplasm or IM. LucB, which we show binds to Mce1C, may play a role as a regulator, or a scaffold protein to recruit other parts of the system that are not yet known. While LucB is not structurally related to LucA, both are small transmembrane proteins that may regulate MCE systems. Our data provide a structural framework for how mycobacteria may use MCE systems to scavenge resources, such as fatty acids, from the host cell by providing a tunnel for the transport of substrates across the cell envelope without Rice, and Bing Wang from the NYU Cryo-EM Laboratory for assistance with cryo-EM grid screening and microscope operation; Sean Mulligan and Lauren Vega at the Paci c Northwest Center for Cryo-EM for assistance with cryo-EM data collection. EM data processing has utilized computing resources at the HPC Facility at NYU, and we thank the HPC team for high-performance computing support. We thank the Central Lab Services team at NYU School of Medicine for preparation of media and buffers.  Cultures were grown at 37 o C with shaking and OD 600 was monitored for each strain using a plate reader (BioTek). At least three biological replicates were conducted and plotted using Prism (GraphPad).
Bacterial growth and protein puri cation Msmeg was grown in Middlebrook 7H9 supplemented with 0.05% (v/v) Tween 80 and additional antibiotics as needed (e.g. 50 ug/mL hygromycin). For protein expression and puri cation of chromosomally GFP-tagged MceG (bBEL591) or GFP- were washed three times with PBST and imaged using a LI-COR (LI-COR Biosciences) and analyzed by ImageJ 75 .

Negative stain electron microscopy
To prepare grids for negative stain electron microscopy, a fresh sample of either MceG-GFP or LucB-GFP was applied to a freshly glow discharged (30 seconds) carbon coated 400 mesh copper grid (Ted Pella Inc., cat. #01754-F) and blotted off.
Immediately after blotting, a 2% uranyl formate solution was applied for staining and blotted off on lter paper. Application and blotting of stain was repeated ve times. Samples were allowed to air dry before imaging. Data were collected on a Talos L120C TEM (FEI) equipped with a 4K x 4K OneView camera (Gatan) at a nominal magni cation of 73,000x corresponding to a pixel size of 2.00 Å /pixel on the specimen, and a defocus range of -1 to -2 μm defocus. For LucB-GFP data, data processing was carried out in cryoSPARC v3.3.1 60 . Micrographs were imported, particles were picked manually as templates for Template Picking. Particles that were picked by template picking were sorted using 2D Classi cation.

Sample preparation for mass spectrometry
Protein samples from wild-type Msmeg cells (strain mc 2 155, bBEL246), MceG-GFP strain (bBEL591), LucB-GFP (bBEL595) strain were puri ed using the protein puri cation method described above. Three biological replicates were performed for each strain and analyzed by mass spectrometry. A nity puri ed proteins were reduced with DTT at 57 ˚C for 1 hour ( were allowed to trigger an MS2 scan.

Analysis of mass spectrometry data
The MS/MS spectra were searched against the NCBI Mycobacterium smegmatis database with common lab contaminants and the sequence of the tagged bait proteins were added using SEQUEST within Proteome Discoverer 1.4 (Thermo Fisher).
The search parameters were as follows: mass accuracy better than 10 ppm for MS1 and 0.02 Da for MS2, two missed cleavages, xed modi cation carbamidomethyl on cysteine, variable modi cation of oxidation on methionine and deamidation on asparagine and glutamine. The data was ltered using a 1% FDR cut off for peptides and proteins against a decoy database and only proteins with at least 2 unique peptides were reported in Supplementary Table 2.
To obtain a probabilistic score (SAINT score) that a protein is an interactor of either MceG or LucB, the data were analyzed using the SAINT Express algorithm 59 Fig. 9e (for LucB), respectively, using Prism (GraphPad).

Cryo-EM sample preparation
The MceG-GFP complex was freshly puri ed as described above. Gel ltration fractions corresponding to higher-molecular  Table 4.

Cryo-EM data processing
The dataset was split up into batches of 1,000 movies (45 batches total) and processed in cryoSPARC v3.3.1 60 16,918 particles)). For a more isotropic reconstruction in 3D, the 1,268 particles from Ref1 were sorted in 2D (N = 10) and different views of the particles were selected individually: side (588 particles), titled (505 particles), top (43 particles). These select particles were used to generate Topaz models to speci cally pick side, tilted, and top views of the particle through the Topaz Train module (expected number of particles = 300, use pretrained initialization, ResNet16).
Using these Topaz picking models, separate Topaz Extract jobs were performed for each view, particles were extracted (box Local re nements were performed for each class by recentering the particles on the region of interest using cryoSPARC Volume Alignment Tool, re-extracting the particles with the new center (box size = 360 px, unbinned), re ning the particles on the re-centered 3D template using Non-uniform Re nement, performing particle subtraction in cryoSPARC using a mask around the region of interest, followed by re nement using cryoSPARC Local Re nement of the subtracted particles. This procedure was performed on each class to generate locally re ned maps for the following regions:  (173,315 particles, 3.19 Å). To generate a composite map, particles from each class were re-extracted with a box size of 640 px (unbinned) and re ned using Non-Uniform Re nement to generate maps that included the entire complex (Map1e for Class 1 and Map2e for Class 2). These maps were used as a template to stitch the locally re ned maps together to generate a composite density map. In regions aside from the extra density (later assigned as LucB/MSMEG_3032), these maps were lower resolution compared to the map from the consensus set of particles before classi cation, but did not show any notable differences compared with the consensus map. Therefore, local re nements were performed on the consensus set of particles in similar fashion used to generate maps for model building, but with masking out the MSMEG_3032/LucB density.
Local re nements were performed using the same approach that was applied to Class 1 and Class 2 on the set particles from the consensus re nement. This procedure was utilized on the following regions: To generate a composite map, the consensus set of particles were also re-extracted with a box size of 640 px (unbinned) and re ned using Non-Uniform Re nement to generate a map that included the entire complex (Map0e). This map was used as a template to stitch the locally re ned maps together to generate a composite density map. These maps were of much higher quality compared to local re ned maps of class 1 and class 2, thus used for initial model building.
For each map, the overall resolution reported in cryoSPARC was estimated using the gold-standard Fourier Shell Correlation criterion (FSC = 0.143). Directional FSCs were computed using 3DFSC 65 . Local resolution maps were computed using the cryoSPARC Local Resolution Estimation module. Locally re ned maps were combined into composite maps for the consensus map, Class 1 and Class 2 using PHENIX v1.20.1 'Combine Focused Maps' module 64 . Composite maps were generated for sharpened maps and half maps (for calculating FSC and estimating local resolution of the composite maps).
For the consensus composite map, maps 0a, 0b, 0c, and 0d were combined using Map0e as a template to generate Map0.
For the class 1 composite map, maps 1a, 1b, 1c, and 1d were combined using Map1e as a template to generate Map1. For the class 2 composite map, maps 3a, 3b, 3c, and 3d were combined using Map2e as a template to generate Map2. Global Starting models were tted into their corresponding locally-re ned maps using the "Fit in Map" function in UCSF Chimera. For each map, the PDB was trimmed to remove regions of the protein that were not de ned in the map. Rigid-body tting into the cryo-EM maps was performed using PHENIX. Fitted models were visually inspected and manually adjusted in COOT. Real-space re nement with Ramachandran and secondary structure restraints was carried out in PHENIX using 5 cycles and 100 iterations to optimize the t and reduce clashes. These models were iteratively inspected, manually rebuilt in COOT and re ned in PHENIX until completion. Models built into the locally re ned maps were aligned and stitched together in PyMOL. These models served as templates to generate a composite density map (Map0) for the consensus set of particles using the PHENIX 'Combine Focused Maps' module.
Models for Map1 and Map2 were built using the model for Map0 as the starting model. The Map0 model was tted and trimmed into the locally re ned maps of each class in UCSF Chimera and PyMOL. Real-space re nement with Ramachandran and secondary structure restraints was carried out in PHENIX. Models were iteratively inspected, manually rebuilt in COOT, and re ned in PHENIX until completion. For Class 1, extra protein density was observed near the TM of Mce1C in the inner membrane region of Map1b that corresponded to an additional subunit bound to the complex, LucB. To determine the identity of this unknown protein, we used a combination of model building and AlphaFold2. The Cα backbone of the polypeptide was traced manually in COOT. This Cα model was used to search structural databases (AlphaFold/Swiss-Prot v2, AlphaFold/Proteome v2, PDB100 211201, GMGCL 2204) using TM-align mode in Foldseek 57 . One of the highestranking hits from this search (TM-score 0.9509) was a putative, converserved, integral membrane protein from Mycobacterium tuberculosis (Rv2536, AF-P95017-F1-model_v2.pdb) found from the AlphaFold Protein Structure Database. The structure of the Msmeg ortholog of this protein (MSMEG_3032/LucB, AFpdb22) was predicted in ColabFold, docked into the cryo-EM density using Chimera, stitched into the model of Map1 using PyMOL), and re ned in PHENIX. Completed locally re ned models were then aligned and stitched together in PyMOL and used to generate a composite density map for Class 1 (Map1) and Class 2 (Map2) in PHENIX. Ligands were added to stitched models for Map1 and Map2 and models were real-space re ned using PHENIX. Table 4         Color key is shown in Fig. 3a. f, Ligand density (magenta) from Map0 in the region of the needle boxed in Fig. 3e and indicated in the top-left inset. spheres. f, Cholesterol growth curves of ΔmceG strain complemented with plasmids containing the following Msmeg MceG mutants: 1) MceG Δ294-360 (pale green) and 2) MceG Y178A (red). Wild-type Msmeg mc 2 155 strain (WT, black), ΔmceG (blue), and ΔmceG complemented with wild-type mceG (ΔmceG+comp, purple) are shown as controls. Growth assays were repeated three times (n = 3) with similar results. Plotted data are the mean of three replicates and standard error bars are shown.

Figure 5
LucB is an accessory factor that binds the Mce1 complex. a-d, Work ow to identify unknown protein subunit for which density is observed in Class 1 (Map1). a, Composite density maps for Class 1 (Map1) and Class 2 (Map2) colored by protein subunit. Color key is shown above. Pink density is observed only in Class 1. b, Cα backbone manually built into extra density observed in Class 1. 3D model of poly-alanine Cα backbone was used as a search model for Foldseek 57 . Protein is colored in rainbow colors (N-terminus, blue; C-terminus, red), and Page 27/28 corresponding cryo-EM density is shown as a transparent grey surface. c, (left) AlphaFold2 prediction of LucB identi ed from Foldseek 57 search. Model is colored by prediction con dence; the N-terminal domain is predicted with high con dence 63