No statistical methods were used to predetermine sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.
Bacterial strain construction
Mycobacterium smegmatis (Msmeg) strains were generated by the oligonucleotide-mediated recombineering followed by Bxb integrase targeting (ORBIT)37. An expression plasmid (pKM444, Addgene #108319, for tagging or pKM461, Addgene #108320, for knockouts)37 containing the Che9c phage RecT annealase and Bxb1 integrase was electroporated into electrocompetent Msmeg cells (mc2155 strain73) and protein expression was induced with 500 ng/mL anhydrotetracycline (ATc, Sigma, cat. #31741). For chromosomal tagging, the induced cells were made electrocompetent and subsequently co-transformed with pBEL2108 (a derivative of payload plasmid pKM468 (Addgene #108434)37 containing a 3C protease cleavage site upstream of the eGFP tag) and a targeting oligonucleotide. MceG-GFP strain (bBEL591) was generated with a 3C-eGFP-4xGly-TEV-Flag-6xHis tag on the C-terminus of MceG (MSMEG_1366) using the following oligo (IDT Ultramer DNA Oligo): 5’-GTTGCCCGCGCGCCGGCCCCTTGAGACACGTCAGGCCGGGCCGTGACGGCCCGGCCTGATCGCGGCAAACTCAGGTTTGTACC
GTACACCACTGAGACCGCGGTGGTTGACCAGACAAACCCGCCTGCTTGGGCACCTCGATGACGCCCGTCGGCGAGTCGTCGTA
GTTCTCGACGGGCGCGGTGGCGGCC-3’. LucB-GFP (bBEL595) strain was generated with a 3C-eGFP-4xGly-TEV-Flag-6xHis tag on the C-terminus of LucB (MSMEG_3032) using the following oligo (IDT Ultramer DNA Oligo): 5’-CACGATGTGTGACGCTACTCGCTACGCTGTGCCCCCATGAGCAAGTGGTTACTGCGCGGAGTGGTGTTCGCAGGTTTGTCTGGTC
AACCACCGCGGTCTCAGTGGTGTACGGTACAAACCCCGCTGGAGAATCCGGACCAGCCGCGTCAGAGCTGATCCGGGCTCAGC
TTCACAAACGAGAGTTGTTGTGGT-3’. Transformants were plated on either LB+agar (Luria-Bertanior, Difco cat. #DF0446-07-5) or 7H10 (Difco, cat# DF0627-17-4) plates containing 50 ug/mL hygromycin (GoldBio, cat. #H-270) and incubated at 37o C for 3-5 days. Colonies were verified for insertion of the payload plasmid by PCR and subsequently confirmed by whole genome resequencing (SeqCenter).
For knockout strains, electrocompetent induced cells were co-transformed with pKM464 (Addgene # 108322)37 and a targeting oligo. The ΔmceG strain (bBEL594) harboring a deletion of mceG (MSMEG_1366) was generated using the following oligo (IDT Ultramer DNA Oligo): 5’-CCGTGACGGCCCGGCCTGATCGCGGCAAACTCACGCCTGCTTGGGCACCTCGATGACGCCGGTTTGTACCGTACACCACTGAGA
CCGCGGTGGTTGACCAGACAAACCCAACCCCGTCACGTCGATTTGGACGCCCATCAAAGATCCTTCCCGCTACGCCTACCACAC-
3’. Transformants were plated on 7H10 plates containing 50 ug/mL hygromycin and incubated at 37o C for 3-5 days. Colonies were verified for insertion of the payload plasmid by PCR and subsequently confirmed by whole genome resequencing (SeqCenter).
Complementation plasmid construction
For complementation of the ORBIT-constructed mceG knockout (bBEL594), a derivative of pMV261zeo (a gift from Jeffory Cox at University of California, Berkeley) was cloned containing wild type mceG (pBEL2759). The coding sequence of mceG was amplified genomic DNA extracted from Msmeg cells using AccuPrime Pfx DNA Polymerase (Invitrogen, cat. #12344032) and cloned into pMV261zeo using Gibson assembly. TOP10 cells (Invitrogen, cat.# C404010) were transformed with the assembled vector using heat shock and plated on LB+agar plates containing 25 ug/mL zeocin (Gibco, cat. #R25001). Colonies were screened for correct DNA sequences using Sanger sequencing (Azenta). Complementation plasmids harboring MceG mutants were generated in a similar manner (pBEL2713, MceG(Y178A); pBEL2719, MceG(𝚫242-360)).
Complementation plasmids were electroporated into electrocompetent ΔmceG Msmeg cells. Cells were plated on 7H10 plates containing appropriate antibiotics (e.g. 25 µg/mL zeocin, 50 µg/mL hygromycin). Colonies were selected, cultured in Middlebrook 7H9 (Difco, cat.#271310) containing 0.05% (v/v) Tween 80 (Sigma, cat. #P1754) and appropriate antibiotics, frozen as 20% glycerol stocks for future use.
Cholesterol growth assay
Cholesterol growth assay was adapted from previous studies14,35. Briefly, Msmeg strains were streaked on 7H10 plates supplemented with 0.05% (v/v) Tween 80 and the appropriate antibiotics from frozen glycerol stocks. Colonies were used to seed M9 medium (1 L dH2O, 12.8 g Na2HPO4, 3 g KH2PO4, 0.5 g NaCl, 1 g NH4Cl, 25 μL 1 M CaCl2, 500 μL 1 M MgSO4) supplemented with 0.5% glycerol and 0.05% (v/v) tyloxapol (Ty, Sigma, cat. #T0307) with appropriate antibiotics. M9 cultures were grown to OD600 of ~0.7-1.0 at 37o C and harvested. Strains were washed twice by pelleting cells by centrifugation at 4,000 rcf for 5 mins at 22o C and resuspended in M9 medium with 0.05% tyloxapol. After the wash steps, strains were resuspended in M9 medium with 0.05% tyloxapol to an OD600 of 0.1 and were used to seed 200 µL cultures (starting OD600 of 0.005) for growth in 96-well plates. For each strain, the following medias were used: 1) M9+0.05% Ty+ 0.5% (v/v) glycerol (carbon source positive control), 2) M9+0.05% Ty+0.009 g/mL methyl-β-cyclodextrin (MBC, Sigma, cat. #C4555) (no carbon source control), and 3) M9+0.05% Ty+0.009 g/mL MBC+ 0.69 mM cholesterol (Sigma, cat. #C8667). Cultures were grown at 37o C with shaking and OD600 was monitored for each strain using a plate reader (BioTek). At least three biological replicates were conducted and plotted using Prism (GraphPad).
Bacterial growth and protein purification
Msmeg was grown in Middlebrook 7H9 supplemented with 0.05% (v/v) Tween 80 and additional antibiotics as needed (e.g. 50 ug/mL hygromycin). For protein expression and purification of chromosomally GFP-tagged MceG (bBEL591) or GFP-tagged LucB (bBEL595), overnight cultures of each strain were diluted 1:1000 and grown with shaking at 37o C and 200 rpm until 0.8-1.2 OD600. Cells were harvested by centrifugation at 4,000 rcf, 4 oC. Pellets were resuspended in lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 5 mM MgSO4, 5 mM 6-aminocaproic acid (Sigma, cat. #A2504), 5 mM benzamidine (Sigma, cat. #B6506) and 1 mM phenylmethylsulfonyl fluoride (PMSF, Sigma, cat. #10837091001)) and stored at -80 oC. Cells were thawed at room temperature and lysed by four passes through an chilled Emulsiflex-C3 cell disruptor (Avestin) at an output pressure of 20 kpsi. Unlysed cells and debris were removed by centrifugation at 39,000 rcf for 30 min at 4 oC. Membranes from the resulting supernatant were pelleted by ultracentrifugation in a Fiberlite F37L-8 x 100 Fixed-Angle Rotor (Thermo Scientific, cat. # 096-087056) at 37,000 rpm for 90 min at 4 oC and resuspended in membrane resuspension (MR) buffer (50 mM Tris-HCl pH 7.5, 15% (v/v) glycerol, 5 mM MgSO4, 150 mM NaCl, 5 mM 6-aminocaproic acid, 5 mM benzamidine, and 1 mM PMSF). Resuspended membranes were stored -80 oC. For affinity purification, membranes were thawed and solubilized overnight with addition of 20 mM n-dodecyl-β-D-maltoside (DDM, Inalco, cat. #D310S) at 4 oC and insoluble material was removed by centrifugation at 37,000 rpm for 60 min. GFP affinity resin was prepared using a method adopted from Pleiner et al.74. Briefly, purified His14-Avi-SUMOEu1-anti GFP nanobody (expressed from pTP396, Addgene #149336)74 was biotinylated using BirA (expressed from pTP264, Addgene #149334)74 and further purified using a Superdex 200 16/60 gel filtration column (Cytiva, cat. # 28-9909-44) equilibrated in GF1 buffer containing: 50 mM Tris/HCl pH 7.5, 200 mM NaCl, 1 mM dithiothreitol (DTT, Amresco, cat. #M109). The biotinylated anti-GFP nanobody was added to Pierce High Capacity Streptavidin Agarose Resin (Thermo Scientific, cat. #20359) equilibrated in GF1 buffer and allowed to incubate with the resin overnight at 4 oC. 0.6 mL bed volume of resin was washed three times with GF1 buffer and blocked by incubation with 100 μM biotin (Sigma, cat. #B4501) in 50 mM HEPES/KOH pH 7.5 for 5 min on ice with occasional mixing. Beads were washed three times with GF1 Buffer and subsequently washed three times with MR buffer containing 20 mM DDM prior to use. Solubilized membranes were incubated with the equilibrated GFP affinity resin at 4 oC for 6 hours and then washed three times with 125 column volumes of membrane wash (MW) buffer (50 mM Tris-HCl pH 7.5, 15% (v/v) glycerol, 5 mM MgSO4, 150 mM NaCl, 5 mM 6-aminocaproic acid, 5 mM benzamidine, 1 mM DDM and 1 mM PMSF). Immobilized proteins were eluted by incubation with 1 mL of 250 nM SENPEuB protease (expressed and purified from pAV286 (Addgene # 149333))74 overnight at 4o C. Eluted proteins were pooled and concentrated before separation on a Superdex 200 16/60 column (GE Healthcare) equilibrated in GF2 Buffer (50 mM Tris-HCl pH 7.5, 5 mM MgSO4, 150 mM NaCl, 1 mM DDM, and 1 mM DTT). Fractions containing GFP-tagged MceG or GFP-tagged LucB were buffered exchanged in storage buffer (50 mM Tris-HCl pH 7.5, 20% (v/v) glycerol 5 mM MgSO4, 150 mM NaCl, 1 mM DDM, and 1 mM DTT) and stored separately in -80 oC.
Western blot for detection of GFP
Purified protein fractions were separated on a Mini-PROTEAN TGX Stain-Free protein gel (Bio-Rad Laboratories, Inc.). Separated protein bands were visualized using “Stain Free Gel” application mode on ChemiDoc MP Imaging System (Bio-Rad Laboratories, Inc.). Protein gel was transferred to a nitrocellulose membrane (Bio-Rad, cat. #1704271) using a Trans-Blot Turbo Transfer System (Bio-Rad Laboratories, Inc.). Membranes were blocked in PBST containing 5% milk for 30 min at 22 oC. The membranes were then incubated with primary antibodies for GFP (custom anti-GFP rabbit polyclonal (provided by Foley lab, Memorial Sloan Kettering Cancer Center) at a dilution of 1:5,000) and His (mouse anti-penta-His antibody (Qiagen, cat. #34660) at a dilution of 1:10,000) in PBST + 5% milk overnight at 4 °C. The membranes were washed three times with PBST and were incubated with goat anti-rabbit IgG polyclonal antibody (IRDye 800CW (LI-COR Biosciences cat. #925–32211) at dilution of 1:10,000) and goat anti-mouse IgG polyclonal antibody (IRDye 680RD, LI-COR Biosciences #926-68070 at a dilution of 1:10,000) as the secondary antibodies in PBST + 5% milk for 1 hr at 22o C. The membranes were washed three times with PBST and imaged using a LI-COR (LI-COR Biosciences) and analyzed by ImageJ75.
Negative stain electron microscopy
To prepare grids for negative stain electron microscopy, a fresh sample of either MceG-GFP or LucB-GFP was applied to a freshly glow discharged (30 seconds) carbon coated 400 mesh copper grid (Ted Pella Inc., cat. #01754-F) and blotted off. Immediately after blotting, a 2% uranyl formate solution was applied for staining and blotted off on filter paper. Application and blotting of stain was repeated five times. Samples were allowed to air dry before imaging. Data were collected on a Talos L120C TEM (FEI) equipped with a 4K x 4K OneView camera (Gatan) at a nominal magnification of 73,000x corresponding to a pixel size of 2.00 Å /pixel on the specimen, and a defocus range of -1 to -2 μm defocus. For LucB-GFP data, data processing was carried out in cryoSPARC v3.3.160. Micrographs were imported, particles were picked manually as templates for Template Picking. Particles that were picked by template picking were sorted using 2D Classification.
Sample preparation for mass spectrometry
Protein samples from wild-type Msmeg cells (strain mc2155, bBEL246), MceG-GFP strain (bBEL591), LucB-GFP (bBEL595) strain were purified using the protein purification method described above. Three biological replicates were performed for each strain and analyzed by mass spectrometry. Affinity purified proteins were reduced with DTT at 57 ˚C for 1 hour (2 µL of 0.2 M) and subsequently alkylated with iodoacetamide at room temperature in the dark for 45 minutes (2 µL of 0.5 M). To remove detergents and other buffer components the samples were loaded onto a NuPAGE® 4-12% Bis-Tris Gel 1.0 mm (Life Technologies Corporation). The gel was run for approximately 25 minutes at 200 V. The gel was stained using GelCode Blue Stain Reagent (Thermo Scientific). The entire protein band was excised, extracted and analyzed in a single mass spectrometry analysis per gel lane. The excised gel pieces were destained in 1:1 v/v solution of methanol and 100 mM ammonium bicarbonate solution using at least three exchanges of destaining solution. The destained gel pieces were partially dehydrated with an acetonitrile rinse and further dried in a SpeedVac concentrator until dry. 200 ng of sequencing grade modified trypsin (Promega) was added to each sample. After the trypsin was absorbed, 250 µL of 100 mM ammonium bicarbonate was added to cover the gel pieces. Digestion proceeded overnight on a shaker at room temperature. The solution was removed and placed into a separate Eppendorf tube. The gel pieces were covered with a solution of 5% formic acid and acetonitrile (1:2; v:v) and incubated with agitation for 15 min at 37°C. The extraction buffer was removed and placed into the Eppendorf tube with the previously removed solution. This was repeated three times and the solution dried in the SpeedVac concentrator. The samples were reconstituted in 0.5% acetic acid and loaded onto equilibrated Micro spin columns (Harvard apparatus) using a micro centrifuge. The bound peptides were washed three times with 0.1% TFA followed with one wash with 0.5% TFA. Peptides were eluted by the addition of 40% acetonitrile in 0.5% acetic acid followed by 80% acetonitrile in 0.5% acetic acid. The organic solvent was removed using a SpeedVac concentrator and the sample reconstituted in 0.5% acetic acid and kept at -80 °C until analysis.
Mass spectrometry data collection
LC separation was performed online on an EASY-nLC 1200 (Thermo Scientific) utilizing Acclaim PepMap 100 (75 µm x 2 cm) precolumn and PepMap RSLC C18 (2 µm, 100A x 50 cm) analytical column. Peptides were gradient eluted directly to an Orbitrap Elite mass spectrometer (Thermo Fisher) using a 95 min acetonitrile gradient from 5 to 35 % B in 60 min followed by a ramp to 45% B in 10 min and 100% B in another 10 min with a hold at 100% B for 10 min (A=2% acetonitrile in 0.5% acetic acid; B=80% acetonitrile in 0.5% acetic acid). Flow rate was set to 200 nl/min. High resolution full MS spectra were acquired every three seconds with a resolution of 120,000, an AGC target of 4e5, with a maximum ion injection time of 50 ms, and scan range of 400 to 1500 m/z. Following each full MS data-dependent HCD MS/MS scans were acquired in the Orbitrap using a resolution of 30,000, an AGC target of 2e5, a maximum ion time of 200 ms, one microscan, 2 m/z isolation window, normalized collision energy (NCE) of 27, and dynamic exclusion of 30 seconds. Only ions with a charge state of 2-5 were allowed to trigger an MS2 scan.
Analysis of mass spectrometry data
The MS/MS spectra were searched against the NCBI Mycobacterium smegmatis database with common lab contaminants and the sequence of the tagged bait proteins were added using SEQUEST within Proteome Discoverer 1.4 (Thermo Fisher). The search parameters were as follows: mass accuracy better than 10 ppm for MS1 and 0.02 Da for MS2, two missed cleavages, fixed modification carbamidomethyl on cysteine, variable modification of oxidation on methionine and deamidation on asparagine and glutamine. The data was filtered using a 1% FDR cut off for peptides and proteins against a decoy database and only proteins with at least 2 unique peptides were reported in Supplementary Table 2.
To obtain a probabilistic score (SAINT score) that a protein is an interactor of either MceG or LucB, the data were analyzed using the SAINT Express algorithm59. A one-sided volcano plot was generated showing fold change (Tag/WT) versus SAINT score. Proteins with a SAINT score ≥0.67 yielded an FDR of ≤5% and were considered potential interactors. Analyzed data are annotated in Supplementary Table 3 (for MceG) and in Supplementary Table 5 (for LucB) and plotted in Fig. 1f (for MceG) and Extended Data Fig. 9e (for LucB), respectively, using Prism (GraphPad).
Cryo-EM sample preparation
The MceG-GFP complex was freshly purified as described above. Gel filtration fractions corresponding to higher-molecular weight complexes containing MceG were screened by negative-stain electron microscopy. Fractions of interest were then concentrated to ~1.7 mg/mL in cryo-EM buffer (50 mM Tris-HCl pH 7.5, 5 mM MgSO4, 150 mM NaCl, 1 mM DDM, and 1 mM DTT). Continuous carbon grids (Quantifoil R 2/2 on Cu 300 mesh grids + 2 nm Carbon, Quantifoil Micro Tools C2-C16nCu30-01) were glow-discharged for 5 sec in an easiGlow Glow Discharge Cleaning System (Ted Pella Inc.). 3.5 μL sample was added to the glow-discharged grid. Using a Vitrobot Mark IV (Thermo Fisher Scientific), grids were blotted for 3-3.5 seconds at 22 ºC with 100% chamber humidity and plunge-frozen into liquid ethane. Grids were clipped for screening.
Cryo-EM screening and data collection
Clipped cryo-EM grids were screened at NYU Cryo-EM Laboratory on a Talos Arctica (Thermo Fisher Scientific) equipped with a K3 camera (Gatan). Images of the grids were collected at a nominal magnification of 36,000x (corresponding to a pixel size of 1.0965 Å) with total dose of 50 e- per Å2, over a defocus range of -2.0 to -3.0 µm. Grids were selected for data collection based on ice quality and particle distribution. Selected cryo-EM grids were imaged at Pacific Northwest Center for Cryo-EM on “Krios 2”, a Titan Krios G3 electron microscope (Thermo Fisher Scientific) equipped with a K3 BioContinuum direct electron detector (Gatan). Super-resolution movies were collected at 300 kV using SerialEM76 at a nominal magnification of 105,000x, corresponding to a super-resolution pixel size of 0.41275 Å (or a nominal pixel size of 0.8255 Å after binning by 2). Movies were collected over a defocus range of -0.8 to -2.4 µm and each movie consisted of 60 frames with a total dose of 60 e- per Å2. A total of 43,925 movies were collected, consisting of 21,915 movies at 0o tilt and 22,010 movies at -30o tilt. Further data collection parameters are shown in Supplementary Table 4.
Cryo-EM data processing
The dataset was split up into batches of 1,000 movies (45 batches total) and processed in cryoSPARC v3.3.160, as described in figs. S3 and S4. Dose-fractionated movies were gain-normalized, drift-corrected, summed, and dose-weighted using the cryoSPARC Patch Motion module. The contrast transfer function was estimated for each summed image using cryoSPARC Patch CTF.
From the first batch of 1,000 images, 27 particles were manually picked in cryoSPARC that were then extracted (boxsize = 480 pixel (px)) and used to train within the Topaz Train module77 in cryoSPARC (expected number of particles = 50, use pre-trained initialization, ResNet16). After training, particles were picked using the trained Topaz model and extracted (10,618 particles, box size = 480 px). CryoSPARC 2D classification (N = number of classes = 50) was performed and particles from 2D classes with high resolution detail were selected (1,051 particles) for Topaz Train (expected number of particles = 300, use pre-trained initialization, ResNet16). Trained Topaz model was used to pick and extract 105,604 (box size 480) particles that were curated by 2D classification (N = 50). Particles from the well-defined classes were selected (14,402 particles after removing duplicates) and further curated using 2D classification (N = 50).
Particles from classes representing top, side, and tilted views were selected (2,887 particles) and processed using cryoSPARC Ab initio Reconstruction to generate an initial 3D model (Ref 1: Complex (1,268 particles), Ref 2 (919 particles), Ref 3 (700 particles)). To generate decoys for downstream particle curation, 50,927 ‘junk’ particles were selected from the 2D classification and processed using cryoSPARC Ab initio Reconstruction to generate three decoy models (Decoy 1 (17,094 particles), Decoy 2 (16,915 particles), and Decoy 3 (16,918 particles)). For a more isotropic reconstruction in 3D, the 1,268 particles from Ref1 were sorted in 2D (N = 10) and different views of the particles were selected individually: side (588 particles), titled (505 particles), top (43 particles). These select particles were used to generate Topaz models to specifically pick side, tilted, and top views of the particle through the Topaz Train module (expected number of particles = 300, use pretrained initialization, ResNet16).
Using these Topaz picking models, separate Topaz Extract jobs were performed for each view, particles were extracted (box size 480, binned by 4), and combined. The combined particles were curated by cryoSPARC 2D classification (N = 50), subjected to duplicate removal (alignments2D), and curated in 3D using cryoSPARC Heterogenous Refinement (N = 4, templates = (1) Decoy1, (2) Decoy2, (3) Decoy3, (4) Model). Particles sorted into template 4 (Model) were selected for further processing. This curation scheme was performed on each batch of micrographs resulting in 2,869,223 curated particles, in which 1,820,584 particles came from the non-tilted images and the remaining 1,048,639 particles came from the -30o tilted images.
Particles were re-extracted (box size = 360 px, unbinned) and were further curated by running six rounds of Heterogeneous Refinement (N = 4, templates = (1) Decoy1, (2) Decoy2, (3) Decoy3, (4) Model), in which particles that were sorted into template 4 (Model) were used as input for the next round. After multiple rounds of Heterogeneous refinement (round 1: 992,273 particles, round 2: 637,446 particles, round 3: 510,255 particles, round 4: 468,001 particles, round 5: 437,324 particles, round 6: 414,343 particles) and removing remaining duplicates (alignment3D), the 341,566 curated particles were refined using cryoSPARC Non-Uniform Refinement78 generating a consensus map at 2.83 Å-resolution.
Heterogeneity was observed around the inner membrane (IM) region of the complex so particles were subject to a round of Heterogeneous Refinement (N = 4, templates = (1-4) consensus map). Class a (48,786 particles) and class b (113,261 particles) both contained additional density corresponding to extra protein density in the IM region and were combined, whereas the additional density were not observed in class c (59,724 particles) and class d (119,795 particles). Class c and Class d were very similar when compared by visual inspection, and these two classes were therefore combined. Non-uniform refinement was performed on the combined sets of particles, resulting in two major classes (both containing density for MceG, YrbE1AB, and Mce1ABCDEF): Class 1 that contains the extra protein density (162,047 particles, 2.94 Å) and Class 2 that lacks this density (179,519 particles, 3.04 Å).
Local refinements were performed for each class by recentering the particles on the region of interest using cryoSPARC Volume Alignment Tool, re-extracting the particles with the new center (box size = 360 px, unbinned), refining the particles on the re-centered 3D template using Non-uniform Refinement, performing particle subtraction in cryoSPARC using a mask around the region of interest, followed by refinement using cryoSPARC Local Refinement of the subtracted particles. This procedure was performed on each class to generate locally refined maps for the following regions: (i) MceG2, (ii) YrbE1AB+Mce1ABCDEF(transmembrane helix+transmembrane domains+Mce ring)+/-extra factor, (iii) Mce1ABCDEF(Mce ring+ first half of C-terminal Mce needle), (iv) Mce1ABCDEF (second half of C-terminal Mce needle). For class 1, the following maps were generated for corresponding regions: (i) Map1a (161,434 particles, 3.05 Å), (ii) Map1b (162,004 particles, 2.89 Å), (iii) Map1c (158,508 particles, 2.97 Å), (iv) Map1d (156,741 particles, 3.16 Å). For Class 2, the following maps were generated for each region: (i) Map2a (178,844 particles, 3.13 Å), (ii) Map2b (179,480 particles, 2.99 Å), (iii) Map2c (175,490 particles, 3.06 Å), (iv) Map2d (173,315 particles, 3.19 Å). To generate a composite map, particles from each class were re-extracted with a box size of 640 px (unbinned) and refined using Non-Uniform Refinement to generate maps that included the entire complex (Map1e for Class 1 and Map2e for Class 2). These maps were used as a template to stitch the locally refined maps together to generate a composite density map. In regions aside from the extra density (later assigned as LucB/MSMEG_3032), these maps were lower resolution compared to the map from the consensus set of particles before classification, but did not show any notable differences compared with the consensus map. Therefore, local refinements were performed on the consensus set of particles in similar fashion used to generate maps for model building, but with masking out the MSMEG_3032/LucB density.
Local refinements were performed using the same approach that was applied to Class 1 and Class 2 on the set particles from the consensus refinement. This procedure was utilized on the following regions: (i) MceG2, (ii) YrbE1AB+Mce1ABCDEF(transmembrane helix+transmembrane domains+Mce ring) masking out density for the extra factor, (iii) Mce1ABCDEF(Mce ring+ first half of C-terminal Mce needle), (iv) Mce1ABCDEF (second half of C-terminal Mce needle). For the consensus map, the following locally refined maps were generated for each region: (i) Map0a (340,238 particles, 2.91 Å), (ii) Map0b (341,490 particles, 2.73 Å), (iii) Map0c (332,050 particles, 2.75 Å), (iv) Map0d (330,104 particles, 3.00 Å). To generate a composite map, the consensus set of particles were also re-extracted with a box size of 640 px (unbinned) and refined using Non-Uniform Refinement to generate a map that included the entire complex (Map0e). This map was used as a template to stitch the locally refined maps together to generate a composite density map. These maps were of much higher quality compared to local refined maps of class 1 and class 2, thus used for initial model building.
For each map, the overall resolution reported in cryoSPARC was estimated using the gold-standard Fourier Shell Correlation criterion (FSC = 0.143). Directional FSCs were computed using 3DFSC65. Local resolution maps were computed using the cryoSPARC Local Resolution Estimation module. Locally refined maps were combined into composite maps for the consensus map, Class 1 and Class 2 using PHENIX v1.20.1 ‘Combine Focused Maps’ module64. Composite maps were generated for sharpened maps and half maps (for calculating FSC and estimating local resolution of the composite maps). For the consensus composite map, maps 0a, 0b, 0c, and 0d were combined using Map0e as a template to generate Map0. For the class 1 composite map, maps 1a, 1b, 1c, and 1d were combined using Map1e as a template to generate Map1. For the class 2 composite map, maps 3a, 3b, 3c, and 3d were combined using Map2e as a template to generate Map2. Global FSCs were calculated by importing composite half maps into the ‘Validation FSC’ cryoSPARC module and local resolution was estimated using the ‘Local Resolution’ cryoSPARC module. The nominal global resolution was estimated to be 2.71 Å for Map0, 2.76 Å for Map1 and 2.90 Å for Map2. Directional FSCs for the composite maps were computed using 3DFSC in cryoSPARC.
Model building and refinement
The mass spectrometry data indicated a mixture of Mce1 and Mce4 proteins in the cryo-EM sample. To assess which proteins were present in the cryo-EM reconstruction, their stoichiometry and position in the complex, we generated AlphaFold263 predictions for each MCE-related protein and assessed their fit into the consensus reconstruction, which contains the ATP-binding cassette (ABC) transporter and the MCE ring. Using ColabFold79, AlphaFold 63 predictions were generated for MceG (AFpdb1), Mce1 proteins (AFpdb1-9), Mce4 proteins (AFpdb10-17), and orphaned MCE protein (AFpdb18). Predictions are summarized in Supplementary Table 6. We performed rigid-body fits of the predicted structures into the cryo-EM map using UCSF Chimera v1.1680, and determined that the complex consisted of two protomers of MceG, two protomers of YrbEs, and six MCE proteins. The two protomers of MceG (AFpbd1) fit unambiguously into the density that corresponded to the ATPase component of the ABC transporter. For YrbE and MCE proteins, we further refined the rigid-body fitted models using real-space refinement in PHENIX v1.20.164. We then examined regions of each protein where the sequences are divergent between candidate proteins and used side chain density in order to assign the correct subunit. The YrbE subunits (AFpdb2-3,10-11) were fit as rigid bodies into the transmembrane region of the cryo-EM map using UCSF Chimera and refined in real space using PHENIX. The refined models were manually inspected in COOT v0.8.9.281 to assess the overall fit for the Ca backbone and side chains of each protein into the map. Based on manual inspection, we assigned the cryo-EM density to YrbE1A and YrbE1B. The MCE domains of each Mce1 (AFpdb4-9) or Mce4 (AFpdb12-17) protein were fitted into each position of the MCE ring (positions 1-6) using UCSF Chimera. Once fit into the density, the MCE domains were real-space refined in PHENIX and manually inspected in COOT. Based on this analysis, Mce1 proteins fit best into the map and were assigned the following positions in the MCE ring (going clockwise): 1) Mce1A, 2) Mce1E, 3) Mce1B, 4) Mce1C, 5) Mce1D, 6) Mce1F. Thus, using this approach, we are able to unambiguously assign Mce1 protein subunits into the cryo-EM map (Extended Data Fig. 4a). Notably, oMce1A (AFpdb18), which was identified in the mass spectrometry data and is 84% identical to Mce1A, fits well into the cryo-EM map at the same position as Mce1A, suggesting a possible mixture of Mce1A and oMce1A in the reconstruction. Focused 3D classification around regions that differ between the two proteins did not produce classes where the density was resolved enough to unambiguously assign Mce1A versus oMce1A. Mce1A was used for modeling the Mce1 complex since it belongs in the same operon as the other Mce1 proteins.
As a starting point for model building of the entire complex, AlphaFold263 and AlphaFold-Multimer82 were used to predict 3D structures of Mce1 proteins and subcomplexes as summarized in Supplementary Table 6. Predictions were performed on ColabFold79 and COSMIC2 83. The C-terminal region of AFpdb20 was trimmed starting at the following residues: Mce1A (residue 167), Mce1B (residue 151), Mce1C (residue 149), Mce1D (residue 160), Mce1E (residue 169), Mce1F (residue 149). For initial model building, AFpdb19, AFpdb20 (trimmed) and AFpdb21 were stitched together in PyMOL Molecular Graphics System (version 2.5.1 Schröodinger, LLC). Briefly, chains were renamed for each prediction: Mce1A (chain A), Mce1B (chain B), Mce1C (chain C), Mce1D (chain D), Mce1E (chain E), Mce1F (chain F), MceG (chain G and H), YrbE1A (chain I), YrbE1B (chain J). Predicted models were aligned in PyMOL using the ‘align’ command: 1) AFpdb19 and AFpdb20 were aligned based on chain I and J, and 2) AFpdb3 was aligned to AFpdb2 based the first α-helical module of the MCE proteins (chain A 150-167, chain B 134-151, chain C 134-149, chain D 145-160, chain E 151-169, chain F 135-149). Overlapping residues were trimmed and aligned models were stitched to produce a composite PDB of the Mce1 complex based on AlphaFold2 predictions.
From the three cryo-EM maps (Map0, Map1, Map2), Map0 has the highest resolution and most featureful density. Thus, modeling of the Mce1 complex was performed on the locally refined maps corresponding to Map0 (Map0a-d), except for model building of LucB, which was carried out using Map1b. Note that Map0 includes Mce1 complex particles with and without LucB. However, since there is no conformational change in the Mce1 complex at the resolutions we are at, the higher number of particles results in better quality density for the Mce1 complex minus LucB. Starting models were fitted into their corresponding locally-refined maps using the “Fit in Map" function in UCSF Chimera. For each map, the PDB was trimmed to remove regions of the protein that were not defined in the map. Rigid-body fitting into the cryo-EM maps was performed using PHENIX. Fitted models were visually inspected and manually adjusted in COOT. Real-space refinement with Ramachandran and secondary structure restraints was carried out in PHENIX using 5 cycles and 100 iterations to optimize the fit and reduce clashes. These models were iteratively inspected, manually rebuilt in COOT and refined in PHENIX until completion. Models built into the locally refined maps were aligned and stitched together in PyMOL. These models served as templates to generate a composite density map (Map0) for the consensus set of particles using the PHENIX ‘Combine Focused Maps’ module.
In Map0, poly-carbon chain unknown ligands (UNLs) were manually built into extra densities corresponding to substrates, and real-space refined in COOT. Elongated ligands (LIG, Chemical string: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC) were generated using PHENIX eLBOW84. Planar ligands derived from BNZ (benzene) and DKM (5-[(3S,4S)-3-(dimethylamino)-4-hydroxypyrrolidin-1-yl]-6-fluoro-4-methyl-8-oxo-3,4-dihydro-8H-1-thia-4,9b-diazacyclopenta[cd]phenalene-9-carboxylic acid). The composite model (containing ligands) was real-space refined into Map0 using PHENIX with global minimization, Ramachandran, secondary structure, and ligand restraints. We use UNLs because the resolution of our density clearly indicates the presence of additional molecules, but is not high enough to unambiguously define these molecules.
Our final consensus model for Map0 is nearly complete, apart from regions in Mce1A (residues 1-17), Mce1C (residues 310-524), Mce1D (residues 1-41 and 268-547), Mce1E (residues 1-32), Mce1F (residues 400-518), MceG protomers (residues 1, 256-280, and 326-360), YrbE1A (residues 1-13), and YrbE1B (residues 1-26), which are predicted to be flexible or unstructured (Extended Data Fig. 4d). Notably, no transmembrane helix was observed for Mce1E (MSMEG_0138; Rv0173/LprK in Mtb). Mce1E has been proposed to be a lipoprotein due the presence of a possible signal peptide and lipobox at its N-terminus85. Intriguingly, the first resolvable residue for Mce1E is C33, the cysteine that would be lipidated; however, density around this region was not sufficient to resolve this modification. In our mass spectrometry data, we do not detect N-terminal peptides for Mce1E which suggest that this region may indeed be cleaved.
Models for Map1 and Map2 were built using the model for Map0 as the starting model. The Map0 model was fitted and trimmed into the locally refined maps of each class in UCSF Chimera and PyMOL. Real-space refinement with Ramachandran and secondary structure restraints was carried out in PHENIX. Models were iteratively inspected, manually rebuilt in COOT, and refined in PHENIX until completion. For Class 1, extra protein density was observed near the TM of Mce1C in the inner membrane region of Map1b that corresponded to an additional subunit bound to the complex, LucB. To determine the identity of this unknown protein, we used a combination of model building and AlphaFold2. The Cα backbone of the polypeptide was traced manually in COOT. This Cα model was used to search structural databases (AlphaFold/Swiss-Prot v2, AlphaFold/Proteome v2, PDB100 211201, GMGCL 2204) using TM-align mode in Foldseek57. One of the highest-ranking hits from this search (TM-score 0.9509) was a putative, converserved, integral membrane protein from Mycobacterium tuberculosis (Rv2536, AF-P95017-F1-model_v2.pdb) found from the AlphaFold Protein Structure Database. The structure of the Msmeg ortholog of this protein (MSMEG_3032/LucB, AFpdb22) was predicted in ColabFold, docked into the cryo-EM density using Chimera, stitched into the model of Map1 using PyMOL), and refined in PHENIX. Completed locally refined models were then aligned and stitched together in PyMOL and used to generate a composite density map for Class 1 (Map1) and Class 2 (Map2) in PHENIX. Ligands were added to stitched models for Map1 and Map2 and models were real-space refined using PHENIX.
Statistics for the final models (Supplementary Table 4) were extracted from the results of the real_space_refine algorithm in PHENIX64 as well as MolProbity86 and EMringer87. Structural alignments and associated RMSD values were calculated using UCSF Chimera v1.1680 and PyMOL (Schröodinger, LLC). FSCs that were calculated in cryoSPARC were plotted in GraphPad Prism v9.3.1. Mce1 tunnel volume was calculated using CASTp v3.061 with a probe radius of 2.2 Å and the inner diameter was calculated using MOLE v2.5 “pore mode”68. Cavity of the ABC transporter substrate-binding pocket calculated by CASTp v3.0 using a probe radius of 2.2 Å. Figures and Supplementary Videos were generated with PyMOL (Schröodinger, LLC), UCSF Chimera and ChimeraX62.
Figure preparation
Figures in which map density is shown were prepared using ChimeraX62 with the following parameters:
- Fig. 2f. Map0 rendered with contour level 10.0.
- Fig. 3c. Ligand density from Map0 rendered using ChimeraX ‘volume zone’ with 3.0 Å distance cutoff around UNL1 and 7.6 contour level.
- Fig. 3f. Ligand density from Map0 was rendered using ChimeraX ‘volume zone’ with 3.0 Å distance cutoff around UNL1-31 and 7.0 contour level.
- Fig. 4c. Ligand density from Map0 rendered using ChimeraX ‘volume zone’ with 2.5 Å distance cutoff around UNL9 and 5.0 contour level.
- Fig. 5a. Map1 rendered with contour level 10.0. Map2 rendered with contour level 10.0.
- Fig. 5b. Protein density from Map1 rendered using ChimeraX ‘volume zone’ with 2.5 Å distance cutoff around 3D model of poly-alanine Cα backbone and 8.0 contour level.
- Extended Data Fig. 3a. Locally refined maps for the consensus set of particles were contoured with the following levels: Map0a (0.281), Map0b (0.257), Map0c (0.259), Map0d (0.199), Map0e (0.17).
- Extended Data Fig. 3b. Map0 contoured to 12.7.
- Extended Data Fig. 3e. Locally refined maps for Class 1 were contoured with the following levels: Map1a (0.172), Map1b (0.201), Map1c (0.185), Map1d (0.167), Map1e (0.15).
- Extended Data Fig. 3f. Map1 contoured to 10.1.
- Extended Data Fig. 3i. Locally refined maps for Class 2 were contoured with the following levels: Map2a (0.177), Map2b (0.148), Map2c (0.163), Map2d (0.126), Map2e (0.15).
- Extended Data Fig. 3j. Map2 contoured to 10.2.
- Extended Data Fig. 4a. Protein densities rendered using ChimeraX ‘volume zone’ with 2.0 Å distance cutoff around the indicated protein residues with the following contour levels: Mce1A/oMce1A (6.0), Mce1F (14.0), Mce1E (10.0), MceGprotomer 2 (10.0), Mce1C (8.0), MceGprotomer 1. YrbE1A (12.0), Mce1D (8.0), Mce1B (8.0), YrbE1B (10.0).
- Extended Data Fig. 4b. Ligand densities rendered using ChimeraX ‘volume zone’ with 2.5 Å distance cutoff around UNLs and with the following contour levels: UNL1 (8.0), UNL4 (6.0), UNL20 (8.0).
- Extended Data Fig. 4c. Protein densities rendered using ChimeraX ‘volume zone’ with 2.5 Å distance cutoff around each TM LucB and contour level 7.0.
- Extended Data Fig. 4d. Map0 contoured to 10.0.
- Extended Data Fig. 7c. Protein densities rendered using ChimeraX ‘volume zone’ with 2.0 Å distance cutoff around each PLL at contour level 10.0.
- Extended Data Fig. 8d. Protein densities rendered using ChimeraX ‘volume zone’ with 2.0 Å distance cutoff around YrbE1B C-terminus and Mce1F PLL and 8.7 contour level.
Quantification and Statistical Analysis
The local resolution of the cryo-EM maps was estimated using cryoSPARC Local Resolution60. Directional 3DFSCs were calculated using 3DFSC65. The quantification and statistical analyses for model refinement and validation on deposited models were performed using PHENIX64, MolProbity86, and EMRinger87. Structural alignments and associated RMSD values were calculated using UCSF Chimera80 and PyMOL (Schröodinger, LLC). Tunnel and cavity volumes were calculated using CASTp v3.061 and tunnel diameter was estimated using MOLE v2.568. Multiple sequence alignments were generated using MUSCLE69 and JalView70. Phenotypic assays were replicated at least three times (n = 3). The mean and standard error of three replicates were plotted using Prism (GraphPad). Protein pulldowns were replicated at least three times (n = 3). MS data was analyzed using Proteome Discoverer 1.4 (Thermo Fisher Scientific) and SAINT Express algorithm59 and plotted using Prism (GraphPad).
Data and code availability.
The cryo-EM maps have been deposited in the Electron Microscopy Data Bank with accession codes: Map0 (EMD-29025), Map0a (EMD-29228), Map0b (EMD-29229), Map0c (EMD-29230), Map0d (EMD-29231), Map0e (EMD-29232), Map1 (EMD-29023), Map1a (EMD-29233), Map1b (EMD-29234), Map1c (EMD-29235), Map1d (EMD-29236), Map1e (EMD-29237), Map2 (EMD-29024), Map2a (EMD-29238), Map2b (EMD-29239), Map2c (EMD-29240), Map2d (EMD-29241), and Map2e (EMD-29242). The coordinates of the atomic models have been deposited in the Protein Data Bank under accession codes: PDB 8FEF (model for Map0), PDB 8FED (model for Map1), PDB 8FEE (model for Map2). Cryo-EM data was deposited in Electron Microscopy Public Image Archive: EMPIAR-11343. The mass spectrometry files are available at MassIVE (https://massive.ucsd.edu) with dataset identifier MSV000090807 and ProteomeXchange (proteomexchange.org) with identifier PXD038456. Bacterial strains and plasmids have been deposited in Addgene and identifiers are listed in Supplementary Table 1.
Methods References
59. Choi, H. et al. Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT. Curr. Protoc. Bioinformatics Chapter 8, Unit8.15 (2012).
60. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
61. Tian, W., Chen, C., Lei, X., Zhao, J. & Liang, J. CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res. 46, W363–W367 (2018).
62. Pettersen, E. F. et al. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
63. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
64. Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75, 861–877 (2019).
65. Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017).
66. Suits, M. D. L., Sperandeo, P., Dehò, G., Polissi, A. & Jia, Z. Novel structure of the conserved gram-negative lipopolysaccharide transport protein A and mutagenesis analysis. J. Mol. Biol. 380, 476–488 (2008).
67. Botos, I. et al. Structural and Functional Characterization of the LPS Transporter LptDE from Gram-Negative Pathogens. Structure 24, 965–976 (2016).
68. Pravda, L. et al. MOLEonline: a web-based tool for analyzing channels, tunnels and pores (2018 update). Nucleic Acids Res. 46, W368–W373 (2018).
69. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
70. Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).
71. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
72. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
73. Snapper, S. B., Melton, R. E., Mustafa, S., Kieser, T. & Jacobs, W. R., Jr. Isolation and characterization of efficient plasmid transformation mutants of Mycobacterium smegmatis. Mol. Microbiol. 4, 1911–1919 (1990).
74. Pleiner, T. et al. Structural basis for membrane insertion by the human ER membrane protein complex. Science 369, 433–436 (2020).
75. Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
76. Mastronarde, D. N. Automated electron microscope tomography using robust prediction of specimen movements. J. Struct. Biol. 152, 36–51 (2005).
77. Bepler, T., Kelley, K., Noble, A. J. & Berger, B. Topaz-Denoise: general deep denoising models for cryoEM and cryoET. Nat. Commun. 11, 5208 (2020).
78. Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).
79. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
80. Pettersen, E. F. et al. UCSF Chimera--a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
81. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
82. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at https://doi.org/10.1101/2021.10.04.463034.
83. Cianfrocco, M. A., Wong-Barnum, M., Youn, C., Wagner, R. & Leschziner, A. COSMIC2. Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact Preprint at https://doi.org/10.1145/3093338.3093390 (2017).
84. Moriarty, N. W., Grosse-Kunstleve, R. W. & Adams, P. D. electronic Ligand Builder and Optimization Workbench (eLBOW): a tool for ligand coordinate and restraint generation. Acta Crystallogr. D Biol. Crystallogr. 65, 1074–1080 (2009).
85. Sutcliffe, I. C. & Harrington, D. J. Lipoproteins of Mycobacterium tuberculosis: an abundant and functionally diverse class of cell envelope components. FEMS Microbiol. Rev. 28, 645–659 (2004).
86. Williams, C. J. et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 (2018).
87. Barad, B. A. et al. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods 12, 943–946 (2015).