Deciphering the transcriptional changes in Escherichia coli strains C41(DE3) and C43(DE3) that makes them a superior choice for membrane protein production.

Background: The production of membrane proteins for functional and structural protein analysis remains a bottleneck in the continuing quest for understanding biological systems. For recombinant membrane proteins, the Walker strains C41(DE3) and C43(DE3) are a valuable tool because they are capable of producing levels of functional protein that would otherwise be toxic to the cell. At the genome level, amongst only a handful of genetic changes, mutations in the lacUV5 promoter region upstream from the bacteriophage T7 RNA polymerase gene distinguish these strains from BL21(DE3) but do not inform on how the strains have adapted for superior production of recombinant membrane proteins. Results: Comparative transcriptomic analyses revealed a moderate change in gene expression in C41(DE3) and C43(DE3) compared to their parent strain BL21(DE3) under standard growth conditions. However, under the conditions used for membrane protein production (with plasmid carriage and addition of IPTG), the differential response of C41(DE3) and C43(DE3) compared to their parent strain BL21(DE3) was striking. Over 2000 genes were differentially expressed in C41(DE3) with a two-fold change and false discover rate < 0.01 and 1700 genes differentially expressed in C43(DE3) compared to their parent strain BL21(DE3). Conclusion: These results illuminate the cellular adaptations occurring in the Walker strains to alleviate the toxic effects that can occur during membrane protein production, whilst providing changes in metabolism pathways required for membrane protein biogenesis. The BL21(DE3) derivatives strains C41(DE3) and C43(DE3), are adept to the process of membrane biogenesis in E. coli, making them superior to their parent strain for the production of membrane proteins and potentially other toxic proteins.


Background
Early estimates that up to 30% of all proteins are integral membrane proteins (1) have been con rmed by proteo-genomic assessments (2). In most cases, solving the structures for these membrane proteins rested on an intractable problem of how to generate su cient quantities of isolated membrane protein. A major hurdle in achieving this is known to be due to cellular constraints that control membrane protein synthesis, targeting and folding (2)(3)(4)(5)(6). In this quest, many E. coli strains have been developed to enhance the cell's ability to overexpress native and non-native proteins. The BL21(DE3) strain had been engineered to expresses a bacteriophage T7 RNA polymerase (T7RNAP), to transcribe a gene of interest at high e ciency, thus producing large amounts of the corresponding protein (7,8). In BL21(DE3), the gene encoding T7RNAP is under the control of the lacUV5 promoter, a stronger variant of the endogenous lac promoter (9). The promoter is subject to repression by LacI. When Isopropyl-β -D-thiogalactopyranoside (IPTG) is added to a cell containing this system, it causes LacI to dissociate from the lacUV5 promoter region, resulting in the production of T7RNAP and subsequent gene expression.
Numerous studies aimed at high-level expression of membrane proteins in BL21(DE3) have failed, with the membrane protein deemed "toxic". This represents a common bottleneck during protein expression trials (10,11). A screen for mutations in the BL21(DE3) host that could survive high-level expression of a membrane protein (the mitochondrial oxoglutarate-malate carrier), recovered the mutant host C41(DE3) (12). It was produced in inclusion bodies but to a high cell density and preventing cell toxicity. One membrane protein, subunit b of bacterial F 1 F 0 -ATPase, was not tolerated by strain C41(DE3), so a further selection was undertaken generating strain C43(DE3) that could produce the protein assembled in the bacterial inner membrane with no toxicity to the cell. C43(DE3) also successfully expressed at least four membrane proteins that could otherwise not be expressed: subunit c of the F 1 F 0 -ATPase, an alanine-H + symporter, the mitochondrial ADP/ATP carrier and the mitochondrial phosphate carrier (12). A subsequent independent evaluation suggested that in 66% of expression constructs tested, the "toxicity" of the plasmids was so high as to prevent identi cation of any plasmid-transformed BL21(DE3) colonies.
The same test when performed in C41(DE3) or C43(DE3) strains demonstrated that all expression constructs could be recovered from transformants with varying expression levels of each membrane protein tested (13).
More than a decade after their discovery, comparative sequence analysis revealed the genetic differences between BL21(DE3) and its derivatives C41(DE3) and C43(DE3) (14,15). There are seven mutations in C41(DE3), and twelve in C43(DE3), compared to BL21(DE3): common to both derivatives are three single nucleotide polymorphisms (SNPs) in the lacUV5 promoter region of the gene encoding T7RNAP. These mutations are responsible for very low levels of T7RNAP compared to the levels of T7RNAP in BL21(DE3) upon the addition of IPTG and results in an improvement in protein production (15,16). Mutations in the genes yehU and rbsD are present in both C41(DE3) and C43(DE3); however, further analysis of these two genes, encoding a putative two-component sensor protein (YehU) and d-ribose pyranase (RbsD), ruled out any role for these factors in membrane protein expression (14). In addition to the common mutations, C41(DE3) also contains additional point mutations in three genes encoding inner membrane proteins (proY, melB, ycgO) and yhhA that encodes a secreted, natively-disordered protein of unknown function. Since all these changes had reverted in the C43(DE3) derivative, they were deemed to not be important for membrane protein expression. C43(DE3) contains mutations in the genes dcuS, fur, yibJ, yjcO and lacI (14). There are two copies of lacI on the BL21(DE3) chromosome, one next to the lac operon and the second in the DE3 region. The mutation of lacI in C43(DE3) was mapped to the latter. In addition, there is an IS1 element inserted into the promoter of cydA and an excision of an IS4 element restores expression of lon, which encodes the ATP-dependent protease Lon (14,16). The Lon protease is associated with regulated protein degradation for the purpose of protein quality control (17,18). Two large genomic deletions across ccmF~ompC and yjiV-yjjN were also identi ed in only C43(DE3) (14).
The general feature that makes a membrane protein particularly toxic to E. coli can be gleaned from considering the process of membrane protein biogenesis. Transmembrane proteins have amino acid compositions skewed in favour of hydrophobic residues, particularly Leu, Ile, Val, Phe and Ala (19)(20)(21)(22)(23)(24) and the hydroxylated amino acid residues Ser and Tyr (21). This establishes two factors that can be rate-limiting to the translation of membrane proteins (i) the activity of the metabolic pathways that synthesize these amino acids, and (ii) aminoacyl-tRNA availability: the transcripts for membrane proteins often feature rare codons, and these are known to impact on overall membrane protein expression levels through several mechanisms impacting on mRNA stability and rates of protein synthesis (4, 6). Furthermore, high-level membrane protein accumulation depends on the availability of molecular chaperones and protein translocases that catalyze the controlled assembly of the nascent proteins into the bacterial membranes (25).
This study assessed the differential gene expression of C41(DE3) and C43(DE3) strains in comparison to their parental strain BL21(DE3), in the presence and absence of a prototypical protein expression vector with or without the inducer IPTG. The analysis details an array of changes implemented throughout the cell. We describe the trends observed and comment on their relevance to enhance membrane protein expression whilst ensuring the expression remains nontoxic to the cell.
We rst wanted to investigate what changes occur in C41(DE3) and C43(DE3) compared to the parental strain BL21(DE3) in the absence of protein expression. Each strain was grown in rich media (Luria Broth, LB) to mid-log phase (OD 600nm = 0.6) at 37 °C. RNA was then isolated and subjected to transcriptional analysis. Each experiment was performed in biological triplicate. RNA libraries were prepared and sequenced. The analysis was performed using RNAsik (26), mapping all reads to the reference strain BL21(DE3) CP001509.3, which was recently updated (27), and the analysis performed in Degust (28).
The data quality was con rmed using Degust (Table S1). Using a log 2 fold change (log 2 FC) ≥|1| and a false discovery rate (FDR) of less than 0.01 as the cut-off for signi cant differential expression, a total of 115 genes were differentially expressed in C41(DE3) from which 68 genes were upregulated, and 47 genes were downregulated (Supp Fig. S1a). In C43(DE3) a total of 239 genes were differentially expressed under the same conditions, where 140 genes were upregulated and 99 were downregulated (Supp Fig. S1b). Remarkably, both strains had the same proportion of genes upregulated (59%) and downregulated (41%) overall. C43(DE3) has twice as many genes differentially expressed compared to C41(DE3), however, there was a 33% overlap in upregulated genes (i.e. 52 of 156 unique genes) (Fig. 1a) and a 21% overlap in downregulated genes (i.e. 25 of 121 unique genes) demonstrating some similiarities between the two derivative strains (Fig. 1b).
Transformed strains, induced with IPTG.
Given the use of the strains for recombinant protein production, we assessed the gene expression pro les when the strains were transformed with a plasmid, pACYCDuet-1, and grown in the presence of the inducer IPTG. The pACYCDuet-1 plasmid is a derivative of the P15A miniplasmid (29). It has a copy number of ~10, carries the lacI gene to provide control over gene expression and a gene that confers chloramphenicol resistance for plasmid selection. We reasoned that the production of a membrane protein target would signi cantly in uence the cell's transcriptomic response. Therefore, theassessment was made that the host E. coli strains would not contain any speci c "gene of interest" cloned into the plasmid, so as not to mask the features of C41(DE3) and C43(DE3) that optimize them for membrane protein expression.
Cells containing the pACYCDuet-1 vector were grown in LB growth medium (with chloramphenicol) to mid-log phase at 37 °C, before the addition of IPTG. Cells were then grown for a further two hours at 37°C . At this point, the cells were collected and RNA was isolated and subjected to transcriptional analysis. Each experiment was performed in biological triplicate. All strains grew similarly before and after the addition of IPTG (Fig. 2a). To delineate between the two different experimental parameters, the strains in this experiment were named BL21(DE3) EV+IPTG , C41(DE3) EV+IPTG and C43(DE3) EV+IPTG .
RNA libraries were prepared and sequenced, the transcripts analysed with Degust, and the data quality assessed statistically (Table S3). Signi cant changes were observed with a total of 2018 genes identi ed as differentially expressed in C41(DE3) EV+IPTG where 1024 genes were upregulated and 994 genes were downregulated as de ned by a change in expression of log 2 FC ≥1 and FDR ≥0.01. In C43(DE3) EV+IPTG a total of 1646 genes were differentially expressed under the same conditions where 827 genes were upregulated and 819 were downregulated (Fig. 2). Comparison of the differential gene expression in C41(DE3) EV+IPTG and C43(DE3) EV+IPTG shows largely similar expression pro les in volcano plots ( Fig. 2b and 2c). Venn diagrams demonstrate the majority of genes that are differentially expressed are common to both of the strains: making up 60% of differentially expressed genes in C41(DE3) (1206 of 2018 unique genes) and 73% of genes in C43(DE3) (1206 of 1646 unique genes) ( Fig. 2d and 2e). Despite this, there are no similarities or overall patterns with respect to the largest fold change of differentially expressed genes between C41(DE3) and C43(DE3), (Supp . Table S4).
Metabolism pathways are signi cantly changed in C41(DE3) EV+IPTG and C43(DE) EV+IPTG . Many of the functional pathways appear to be very similar between C41(DE3) and C43(DE3). The differentially expressed genes in C41(DE3) EV+IPTG and C43(DE) EV+IPTG were classi ed according to their COG pathways (Fig. 3) and also using their KEGG annotations (Supp Fig. 2). Genes that were signi cantly changed in both C41(DE3) EV+IPTG and C43(DE) EV+IPTG were identi ed. The large majority of differentially expressed genes encode proteins involved in metabolism, particularly energy production and conversion (C) including TCA cycle genes (sucABCD), ATP biosynthesis (atpAGH) and respiration (nuoABCEFGIJKLM). A range of genes involved in amino acids metabolism were upregulated. These include genes in the biosynthetic pathways and metabolism of cysteine, methionine, tryptophan, tyrosine, phenylalanine, alanine, proline; amino acids utilized particularly for membrane protein biogenesis. We note that intermediates from TCA cycle also feed directly into the pathways for leucine, isoleucine and valine biosynthesis (30).
Other areas signi cantly upregulated include carbohydrate transport and metabolism (G), including melibiose transporters (melA, melB), trehalose/glucose metabolism (otsA, otsB, treC) and pyruvate metabolism (pykA) and inorganic ion transport and metabolism (P) inclusing taurine transport (tauA, tauD) and oligopeptide ABC transporters (oppB, oppC, oppD). Many of these genes encode membrane proteins, which we hypothesized might place demands to increased capacity in the membrane protein biogenesis pathway. In addition, activation of genes mediating transcription/translation processes (K) and cell wall/membrane/envelope biogenesis (M) are upregulated log 2 FC>2 with FDR<0.01 (Fig. 3). We nd that in many of these pathways the same proportion of genes are being upregulated and downregulated concomitantly. This suggests there are global changes in play within speci c pathways that are unique to the derivative strains possibly affecting their response to membrane protein biogenesis ( Fig. 3).
Adaptations for inner membrane biogenesis in C41(DE3) and C43(DE3) We discovered that genes encoding several molecular chaperones and components of the membrane biogenesis pathway are transcribed at higher levels in C41(DE3) EV+IPTG and C43(DE3) EV+IPTG compared to BL21(DE3) EV+IPTG (Table 1). In gram-negative bacteria, membrane protein biogenesis relies on protein translocases in the inner membrane (the SecYEG translocase for unfolded polypeptides or the TAT for folded proteins) and the outer membrane protein BAM, the core b-barrel assembly machinery and TAM,the translocation and assembly module of the b-barrel assembly machinery. Periplasmic intermediates are maintained by a series of chaperones and proteases. Proteins destined for the inner membrane rely on cytoplasmic chaperones and protease inhibitors to ensure the translocation pathway remains e cient. The schematic in Figure 4 illustrates genes involved in this pathway of protein targeting to the Sec translocon for membrane protein folding and assembly. Many of these genes are upregulated in C41(DE3) and C43(DE3), demonstrating how these derivative strains can manage an onslaught of gene expression that would otherwise be toxic to the cell (Table 1, Fig. 4) (13,31). Table 2 shows no signi cant changes were seen in genes involved in the membrane-embedded components of the SecYEG machinery, although there was some signi cant upregulation of some of the TAT translocon components (tatC, tatD, tatE). The TAM components of the outer membrane protein assembly machinery were downregulated, while the BAM components remained unchanged.
Genes encoding for the two chaperones of the Sec translocation pathway, SecA and SecB, are both upregulated as are molecular chaperones located in the cytoplasm (groEL, groES, dnaK and ybbN; Table  1) and periplasm (degP, degQ and fkpA; Table 2). . The increased abundance of these chaperones could provide capacity to collect nascent membrane proteins prior to engagement with the membrane translocases, and to assist in the folding of the domains of the membrane proteins that protrude into the cytoplasm and periplasm.
Several genes involved in polysaccharide biosynthesis were upregulated, which is essential for building the outer lea et of the outer membrane surface (32). Genes in the retrograde phospholipid tra cking pathway mlaABCD were signi cantly downregulated. It remains untested but possible that these changes might reorganise membrane structure to be permissive for enhanced inner membrane protein accumulation.

Functionally unknown, unassigned and uncharacterized
Many of the genes that were differentially expressed are categorised as "functionally unknown" (S) using the COG annotation (Fig. 3), or as "unassigned" using the KEGG annotators (Supp Fig. 2). Since the COG annotations have not been updated for several years, we were interested in con rming the number of genes with still unknown function. The genes were compared to a recently published Y-ome: an updated list of every uncharacterized gene in E. coli K-12 MG1655 (33). From the total of 1024 genes upregulated in C41(DE3) EV+IPTG , 226 of these were assigned to group [S], the "uncharacterized" COG identi er (Supp Table S6). From these 226 genes, 138 remain uncharacterized evidenced by their presence in the E. coli MG1655 Y-ome list of uncharacterized genes. This accounts for 14-18% of all differentially expressed genes in C41(DE3) EV+IPTG and C43(DE3) EV+IPTG (Supp Table S6). . Further characterisation of these various genes will aid in the overall understanding of cellular responses to enhance membrane protein expression.
How do the genetic mutations in C41(DE3) and C43(DE3) affect their transcriptome pro le?
The genomes of C41(DE3) and C43(DE3) are published con rming the known mutations present in the T7RNAP and also identifying several other changes (14). We were interested in determining if these mutations affected the expression of their corresponding genes. Both strains contain mutations in the lacUV5 promoter region of the T7RNAP that revert it back to a weaker form. A downregulation of T7RNAP is observed in all strains compared to their respective BL21(DE3) controls (Supp. Table S5). As discussed earlier the rbsD IS3 excision causes upregulation of the rbs operon.
Genomic sequencing of C41(DE3) identi ed four unique changes not passed onto C43(DE3) (14); Supp. Table S5). Of the three genes containing a single amino acid change, there is a signi cant upregulation of melB and yhhA in our analysis in both C41(DE3) EV+IPTG and C43(FDE3) EV+IPTG in comparison to BL21(DE3) EV+IPTG but no signi cant change in ycgO expression (Supp . Table S5). MelB is a sugar transporter of melibiose coupled with cation exchange (34) that has been shown to be affected by membrane composition of the inner, thus changes may merely re ect a cellular response to an altered membrane environment (35). YhhA is an uncharacterised protein that contains a signal sequence suggesting it localizes to the cell envelope; however, nothing more has been reported about this gene. C43(DE3) contains mutations in the genes dcuS, fur, cydA, yibJ, yjcO, lon and lacI (14). The majority of mutations do not invoke any signi cant changes in their gene expression compared to the relevant BL21(DE3) controls. The gene encoding the ATP-dependent protease Lon, is signi cantly upregulated due to the excision of an IS4 element that restores expression of lon (Supp . Table S5) (14,16). The Lon protease is associated with regulated protein degradation for the purpose of protein quality control (17,18). A point mutation in the lac repressor, lacI present in the DE3 region of C43(DE3) results in the downregulation of lacI expression in C43(DE3) compared to BL21(DE3) albeit not signi cant according to our set parameters (log 2 FC -1.3, FDR 0.03; Supp Fig. S5). This mutation in C43(DE3) has previously been suggested to be less responsive to its inducer allolactose (13) and subsequently results in superior repression of the lac operon. In the presence of the vector pACYCDuet-1 and induction with IPTG, the lacI expression observed is masked by the contribution of the plasmid encoded lacI expression.

Discussion
The expression of membrane proteins in bacteria remains a popular strategy in the quest to obtain large amounts of stable and folded recombinant protein for use in structural and functional studies. The C41(DE3) and C43(DE3) strains remain an initial port of call for the expression of membrane or toxic proteins owing to the anecdotal and published success of these strains over the past few decades since their generation.
The selection process for C41(DE3) and C43(DE3) The C41(DE3) and C43(DE3) strains have been used to express membrane proteins from prokaryote and eukaryote sources. The genetic changes that occurred in the generation of these strains are limited to a handful of genes particularly involved in transcription and translation, changes which were not informative to the mechanism for increased membrane protein production. From their initial discovery, it was apparent that transcription was slower in these strains compared to their parent BL21(DE3) strain and C43(DE3) also had a delayed onset of transcription after protein induction which would arguably enable the protein to fold into the membrane (12,16). We hypothesise that during selection for C41(DE3) (and the C43(DE3)), stress responses triggered by the demands of producing large quantities of a challenging target protein selected for changes in the basal expression parameters of the cell, including features that tune the strains transcriptional program to enhance outcomes following addition of IPTG.
In this study, we have explored the global changes in transcriptomes by comparing the changes induced in the transcriptome of C41(DE3) and C43(DE3) with those in the parent strain BL21(DE3).
The majority of cellular adaptations occur under conditions for the induction of protein expression. As predicted, this did not require the inclusion of a "protein of interest". Upon addition of IPTG, C41(DE3) and C43(DE3) strains activate multiple pathways/operons associated with protein production to a greater extent than the parental BL21(DE3). In particular, we found a signi cant increase in the genes encoding molecular chaperones and proteases, as well as factors required for translocation into or across the inner membrane.
Re-tooling membrane protein biogenesis in C41(DE3) and C43(DE3) Proteins destined for the inner membrane are typically targeted to the Sec translocon, either cotranslationally or post-translationally in an unfolded state assisted by molecular chaperones, for vectorial integration into the inner membrane (36). YidC is another integral membrane protein that catalyzes the integration of membrane proteins into the inner membrane (37). Populations of YidC thereby engage with the ribosome, with the SecYEG-SecDF-YajC complex to enhance membrane protein insertion (38)(39)(40). Our results showed the strains C41(DE3) and to a lesser extent C43(DE3), contained a moderate increase in the transcription of yidC, that would contribute to an increased capacity to integrate inner membrane proteins (41). Notably, the protein expressed to generate C43(DE3), subunit b form F 1 -F O ATPase, is not a substrate of YidC, thereby not requiring it during inner membrane integration (42). Genes like yidC are networked in transcriptional circuitry such that depletion of YidC in E. coli causes a change in the expression of ~250 genes of various functions, including energy metabolism; metabolite transport, protein folding and quality control; translation and transcription, and were reported to overlap genes regulated through the CpxAR-mediated stress pathway and phage-shock response (43).
Previously, comparative analysis of the membrane protein YidC observed a signi cant decrease in cytosolic chaperones and proteases including ClpB, IbpA and HslUV in C41(DE3) and C43(De3) compared with BL21(DE3)pLysS (16). The production of YidC in BL21(DE3)pLysS was previously shown to be found largely in an aggregate form, which in turn produced signi cant levels of the aforementioned chaperones and proteases (10). These aggregate forms of YidC were not observed in C41(DE3) and C43(DE3) and would explain why amounts of the cytosolic chaperones and protease observed would also be signi cantly less (16). The proteomics analysis by Wagner et al. saw no signi cant change in the main components of the Sec translocon which was also a feature seen in our own results.
How are these transcriptional networks activated?
The cell envelope stress response in E. coli is initiated during membrane protein production via two pathways: the two-component CpxAR system and the sE response. The sE/s24 transcriptional program is encoded by the gene rpoE (44,45). In our study there was no upregulation of rpoE nor are the major negative regulators rseA, rseB or rseC, however, several genes regulated by the cell envelope stress response directly related to membrane protein folding are upregulated; such as the periplasmic proteases degP, degQ, degS and fkpA (46). The same scenario occurs in the CpxAR system, that also enacts a transcriptional response to various stresses including membrane-protein defects (47). From our results some the stress-inducible operons (secA and araF) were upregulated, as were genes encoding chaperones generally under the CpxAR control, including the protease/chaperone degP, the disul de oxidase dsbA and yccA.
In a global assessment of the CpxAR system, De Wulf et al (2002) identi ed 100 target operons, including the pps, aroF/aroK, rpoE/rseABC and secA operons (48). The study also showed that the signal transduction pathway coordinating the Cpx response interacts in unexpected ways with several other transcriptional control circuits.
Recombinant protein expression is another stress that triggers deployment of alternative sigma factors (49). For example, expression of a fusion protein consisting of the periplasmic maltose-binding protein and beta-galactosidase (MalE-LacZ) blocks the export of other proteins destined for secretion via the Sec pathway, and results in induction of secA, groEL and dnaK (50), and deletion of secB also triggers induction of dnaK, groEL, htpG, clpB (F84.1), grpE and groES (51). Many of these genes are coordinated by the σ H (σ 32 ) transcriptional program or heat-shock response, regulated by the rpoH gene (44,(52)(53)(54), We saw many of these genes upregulated in C41(DE3) and C43(DE3) (as outlined in Table 1, Fig. 5).
The distinctions between C41(DE3) and C43(DE3) Genetically and transcriptionally the C41(DE3) and C43(DE3) strains are largely similar. By way of comparison, we assessed the unique genes that were differentially expressed in C43(DE3). Only 150 genes unique to C43(DE3) were differentially expressed (Fig. 1). When looking at the gene distribution according to known COG pathways, many genes clustered within energy production and conversion, inorganic ion transport and metabolism and carbohydrate transport and metabolism. A further 15% of genes/ORFs have no annotated function. Notably, C43(DE3) was isolated due to its ability to express subunit b of the F 1 -F 0 ATPase and most importantly, assemble it into the inner membrane. The other candidate proteins tested for expression by Miroux and colleagues were all expressed in inclusion bodies (12). We postulate that the additional changes implemented by C43(DE3) may have contributed to the already present upregulation of functional pathways seen in C41(DE3) to enhance inner membrane protein biogenesis.

Concluding remarks
In addition to C41(DE3) and C43(DE3), other strains are gaining attention. For tighter control over the switch to induction of protein expression, Lemo21(DE3) is another BL21(DE3) derivative that expresses a T7RNAP inhibitor under the control of a titratable rhamnose promoter (16,55). Two other BL21(DE3) derivatives directly from BL21(DE3) are C44(DE3) and C45(DE3) (56). These bacterial hosts offer unique features to improve membrane protein expression including tight repression of gene expression at 37 °C, a tunable expression with increased IPTG and continuous protein production throughout the exponential and stationary phases of growth. Sequencing of the C44(DE3) and C45(DE3) genomes identi ed the mutations responsible for these extensive cellular changes. In the future it will be interesting to explore the gene regulation that has evolved in these strains and compare this with C41(DE3) and C43(DE3), particularly via the Y-ome components. As we have shown here, understanding the breadth of phenotypic and functional changes that occurs from very limited genetic differences will only aid researchers more in their construction of an e cient and selective tool kit for their protein production strategies.

Methods
Bacterial strains, plasmids, and culture conditions Overexpress* C41(DE3) and C43(DE3) strains were purchased from Lucigen (cat#60452-1), the BL21(DE3) strain was purchased from Novagen (cat# 70235-3). Triplicate samples were grown overnight at 37 °C and used to inoculate 25mL Luria broth (LB) cultures and grown to mid-log phase (0.4-0.6). For the second round of RNAseq samples, the three strains were transformed with the vector pACYCDuet-1 (Novagen) and selected on LB agar plates containing 34ug/mL chloramphenicol and grown overnight at 37°C. Overnight cultures supplemented with chloramphenicol were grown in triplicate from three independent colonies and grown at 37°C. These were used to inoculate 25mL LB cultures supplemented with chloramphenicol and grown to mid-log phase (0.4-0.6). For RNA extraction, ~1-1.5mL of cultures were mixed with 2 times volume of Bacteria Protect Reagent (QIAGEN), vortexed and incubated at room temperature for 5 min. Cells were pelleted (4,600 rpm, 10 min, RT) and then stored at -20°C for RNA extraction.

RNA extraction
Total RNA was puri ed using RNeasy Kit (QIAGEN Protocol 4 and Protocol 7 with on-column DNase treatment). Brie y, Lysis buffer was added to the cells, followed by the RLT buffer and ethanol. Lysate was loaded onto the RNeasy spin column and centrifuged to bind material. The column was washed with buffer RW1. DnaseI stock solution was added to the column and incubated for 15min at RT. Additional RW1 buffer was added and then centrifuged again. The column was washed with RPE buffer and then the RNA eluted with RNAse-free water. RNA concentration and quality were assessed by A 260 /A 280 readings by Nanodrop and presence of degradation assessed on an agarose gel. Further analysis of RNA using Fluorimetric quantitation by Invitrogen Qubit with the Invitrogen Quant-iT dsDNA HS Assay Kit and CE integrity analysis by Agilent Fragment Analyzer (FA) using Agilent HS RNA kit was performed by Micromon.

Transcriptomics and analysis
All cDNA libraries and sequencing was performed by Micromon however two different platforms were used. The rst samples [BL21(DE3), C41(DE3), C43(DE3)] cDNA libraries were prepared using the Epicentre ScriptSeq Complete (bacteria) V2 library construction chemistry with 5000ng of input RNA and were then sequenced on a MGITech MGISEQ2000-RS, DNBSEq chemistry V2, with a PE100 Customized V1 sequencing kit, paired-end 100b reads, 1 sequencing lane (FCL) and prepared according to the manufacturer's instructions. The second set of sample [BL21(DE3) EV+IPTG , C41(DE3) EV+IPTG , C43(DE3) EV+IPTG ] cDNA libraries were also prepared using the Epicentre ScriptSeq Complete (bacteria) V2 library construction chemistry with 2500ng of input RNA and were then sequenced on an Illumina NextSeq500, SBS V2 chemistry, single-end 75b reads, 1 sequencing lane (high-output), with 0.85pM loading concentration according to the manufacturer's instructions. During this project, Micromon updated their sequencer and samples were analysed using the two different platforms. Due to this scenario, we have deliberately made no comparisons between the two sets of data.
The raw fastq les were analyzed using the RNAsik pipeline (26) where the bwa mem aligner (57) was used to align reads to the BL21(DE3) reference genome CP001509.3, the reference GFF and FASTA les were downloaded from the RefSeq database. Reads were quanti ed with featureCounts (58) producing the raw gene count matrix and various quality control metrics, all summarised in a MultiQC report (59). The gene count matrix was analysed with Degust (28), a web tool which performs differential expression analysis using limma voom normalisation (60) producing counts per million (CPM) library size normalisation and trimmed mean of M values (TMM) normalisation (61) for RNA composition, and also several quality plots such as classical multidimensional scaling [MDS] and MA plots. Differentially expressed genes were de ned as those showing a >2-fold change in expression (log 2 expression ratio 1) with a false-discovery rate (FDR) of 0.01. Of note, in sample BL21(DE3) EV+IPTG _3 a large proportion of the sequenced data was mapped to the vector pACYCDuet-1 leaving ~30% of data available (Supp. Table 5.), however, since all replicates behave well on the MDS plot, the differential expression was taken as meaningful and reliable.
Genes were mapped using EggNOGmapper (62,63) to multiple identi ers including COG and KEGG and GO annotations. Those genes classed as the COG identi er [S] or unannotated were further investigated for function using the recently published Y-ome list (33).
Bacterial strains, plasmids, and culture conditions Overexpress* C41(DE3) and C43(DE3) strains were purchased from Lucigen (cat#60452-1), the BL21(DE3) strain was purchased from Novagen (cat# 70235-3). Triplicate samples were grown overnight at 37 °C and used to inoculate 25mL Luria broth (LB) cultures and grown to mid-log phase (0.4-0.6). For the second round of RNAseq samples, the three strains were transformed with the vector pACYCDuet-1 (Novagen) and selected on LB agar plates containing 34ug/mL chloramphenicol and grown overnight at 37°C. Overnight cultures supplemented with chloramphenicol were grown in triplicate from three independent colonies and grown at 37°C. These were used to inoculate 25mL LB cultures supplemented with chloramphenicol and grown to mid-log phase (0.4-0.6). For RNA extraction, ~1-1.5mL of cultures were mixed with 2 times volume of Bacteria Protect Reagent (QIAGEN), vortexed and incubated at room temperature for 5 min. Cells were pelleted (4,600 rpm, 10 min, RT) and then stored at -20°C for RNA extraction.

RNA extraction
Total RNA was puri ed using RNeasy Kit (QIAGEN Protocol 4 and Protocol 7 with on-column DNase treatment). Brie y, Lysis buffer was added to the cells, followed by the RLT buffer and ethanol. Lysate was loaded onto the RNeasy spin column and centrifuged to bind material. The column was washed with buffer RW1. DnaseI stock solution was added to the column and incubated for 15min at RT. Additional The raw fastq les were analyzed using the RNAsik pipeline (26) where the bwa mem aligner (57) was used to align reads to the BL21(DE3) reference genome CP001509.3, the reference GFF and FASTA les were downloaded from the RefSeq database. Reads were quanti ed with featureCounts (58) producing the raw gene count matrix and various quality control metrics, all summarised in a MultiQC report (59).
The gene count matrix was analysed with Degust (28), a web tool which performs differential expression analysis using limma voom normalisation (60) producing counts per million (CPM) library size normalisation and trimmed mean of M values (TMM) normalisation (61) for RNA composition, and also several quality plots such as classical multidimensional scaling [MDS] and MA plots. Differentially expressed genes were de ned as those showing a >2-fold change in expression (log 2 expression ratio 1) with a false-discovery rate (FDR) of 0.01. Of note, in sample BL21(DE3) EV+IPTG _3 a large proportion of the sequenced data was mapped to the vector pACYCDuet-1 leaving ~30% of data available (Supp. Table 5.), however, since all replicates behave well on the MDS plot, the differential expression was taken as meaningful and reliable.

Figures
Venn diagrams illustrating the differentially expressed genes (DEGs) that are a upregulated and b downregulated in C41(DE3) and C43(DE3), compared to gene expression in their parent strain BL21(DE3).
Differential expression is de ned by a change in expression with a log2FC ≥1 and FDR ≤0.01. c and d Differential gene expression organised by COG classi cation in C41(DE3) (blue) and C43(DE3) (red) respectively.  Table 1).