Comparative genomics analysis of norvancomycin producing strains
The complete genome of industrial producing strain A. orientalis NCPC 2-48 and that of original strain A. orientalis CPCC 200066  are both circular chromosomes of 9.5 Mb with a G + C content of 68.84% (Figure 1C). The genome analysis of industrial strain showed that it contained 8,705 genes, and the total length of genes was 8,579,274 bp, which makes up to 90.41% of genome. There are 404 tandem repeat sequence (35,253 bp), which makes up to 0.3715% of genome, 325 minisatellite DNAs, 11 microsatellite DNAs, 50 tRNAs, and 12 rRNAs.
We compared the genomes of NCPC 2-48 to CPCC 200066 and found that two genomic sequences are extremely similar with a high identity of 99.97%. There was no large fragment duplication or deletion in the entire genome of NCPC 2-48 compared with CPCC 200066. Moreover, the internal structure of the chromosome and gene order were largely conserved without rearrangements appeared in the genome of NCPC 2-48. Both strains have extremely similar codon usage of 50 tRNA genes. For the norvancomycin biosynthesis, further comparative analysis of secondary metabolism gene clusters showed that no duplication, SVs (structural variations) or InDels (insertions and deletions of small fragments (≤50 bp)) were occurred in nvcm biosynthetic gene cluster in the industrial strain. It suggested that the high-yield of the industrial strain is not caused by the increased copy number of the nvcm biosynthetic gene cluster or gene mutations (SVs or InDels) within the cluster.
The genomic difference between these two strains is mainly accounted for three SVs, including two deletions of 965 bp (SV1) and 12,212 bp (SV2) fragments and one insertion of 12,076 bp (SV3) fragment in strain NCPC 2-48 (Figure 1C, Table S3). All these SVs appear at coding DNA sequence (CDS) regions but far away from the nvcm biosynthetic gene cluster. There are 34 protein coding genes involved in these fragments of deletion and insertion, including 3 regulators, 1 transporter, 3 transposases, 9 other enzymes and 18 unknown proteins (Table S3). The coding gene information of the three SVs is shown in Table S4.
The first structural variation, SV1, occurred in gene B37_4355 (Table S4). The 965 bp of B37_4355 sequence is deleted in genome of NCPC 2-48. SV2 is a deleted fragment of 12,212 bp in genome of NCPC 2-48, corresponding to the original strain’s chromosome position from gene B37_6566 to B37_6583. There are 17 possible CDSs in this region (Table S4). The further analysis of the sequence flanking SV2 showed that one lacI family transcriptional regulatory gene is located upstream. Usually, lacI family regulators control the expression of some key enzymes involved in carbon metabolism, and regulate the transcription of a series of downstream genes including some transcriptional factors . The deletion of genes downstream lacI might block the regulation of lacI and enhance glucose catabolism of primary metabolism, which is conducive to the precursor sugar synthesis of norvancomycin. SV3 is a 12,076 bp of insertion, which contains 16 possible CDSs (Table S4). Analysis of function of flanking genes revealed that there are some key enzymes of primary metabolism such as acetyl-CoA dehydrogenase and acetyl-CoA synthetase at downstream of SV3.
In addition to three fragments of deletion and insertion, there are also 216 InDels present in the genome of high-yield strain. The coding regions of some important enzymes related to primary and secondary metabolism together with transcription factors, such as LacI, LysR, TetR, MerR, YebC/PmpR, SARP family proteins and two-component regulators were present in these InDels (Table. S5, S6). The InDels in these transcriptional factors may affect the expression of norvancomycin through some unknown regulatory mechanism. However, the function of these transcriptional factors remains unexplored. In these SVs and InDels, there are no functionally known genes directly related to the biosynthetic pathway of norvancomycin and its precursors based on their functional annotation. Thus, the genomic variations of industrial strain would be difficult to give a simple explanation on its high-yield production of norvancomycin. Since some key enzymes of primary metabolism were found in these SVs and InDels, or in flanking segments, we speculated that these genomic mutations may change the metabolic flow by affecting expression of the important enzymes of primary metabolism. Meanwhile, more than ten regulatory genes were detected in SVs and InDels. Along with the loss, insertion or mutation of these regulatory genes, the holistic regulation of primary or secondary metabolism and cell growth may be changed in the high-yield producer strain, which ultimately makes the overall metabolic flow more favorable for the biosynthesis of norvancomycin.
Transcriptomics profiling at norvancomycin producing strains
In order to further investigate the mechanism of high yield production of norvancomycin, the transcriptomic analysis of norvancomycin original strain and industrial strain were carried out at three different time points (12 h, 24 h and 48 h). RNA from NCPC 2-48 and CPCC 200066 were extracted and sequenced, and an average of 23,578,714 raw reads were generated. After removing low-quality reads, the average number of remaining clean reads was 23,523,435, and the average comparison rates of clean reads to the reference gene and reference genome were 80.54 % and 96.69 %, respectively. The statistics on the sequencing data for each sample is shown in Table S7. Pair-wise differentially expressed gene (DEG) analyses revealed that more than 2000 DEGs, about one-fourth of the total genes in the genome, had significantly lower or higher transcript abundance (fold change (FC) > 2 and false discovery rate (FDR) ≤ 0.001) at each time points (12 h, 24 h and 48 h) in NCPC 2-48 relative to CPCC 200066, as shown in Fig. 2A. The transcriptional levels of some genes within the nvcm cluster were verified by RT-qPCR (Fig. S1). The results of DEGs hierarchical clustering analysis showed that differential gene expression pattern was similar at 24 h and 48 h (Fig. 2B), more than two-thirds of the differential genes were up-regulated in the industrial strain. Interestingly, more genes were downregulated at 12 h compared to other two time points (Fig. 2B), and there are 105 genes downregulated at 12 h but upregulated at 24 h and 48 h time points. KEGG pathway search showed that 35 out of these 105 genes are located in nvcm cluster. The biosynthetic pathways of secondary metabolites are usually activated in a growth phase-dependent manner, so that the genes responsible for secondary metabolism coincide with the onset of stationary phase in liquid fermentation in microorganisms. In the case of high-yield strain, transcriptional levels of genes responsible for biosynthesis of norvancomycin are lower in the early stage of growth (12 h), and then upregulated abruptly from 24 h to 48 h, showing that the norvancomycin biosynthesis is more strictly controlled during the different growth stages. Due to the similarity of gene expression patterns between 24 h and 48 h, and most of DEGs included at 24 h (2,039 up-regulated and 628 down-regulated), we then analyzed the functional pathway enrichment of DEGs at 24 h based on KEGG database. Enrichment analysis of the functional categories of the transcriptome indicated that a total of 1,764 differential genes were annotated into 150 metabolic pathways, most of them related to the primary metabolism such as nitrogen metabolism and arginine biosynthesis, as well as the biosynthesis of secondary metabolites such as norvancomycin, tetracyclines and other type II polyketides, degradation of naphthalene and aromatic compounds, tyrosine and inositol phosphate metabolism processes. Top 20 of most specific KEGG enrichment results as shown in Fig. 2C.
In particular, the visualization of the transcriptome (Fig. 2D) showed that in the nvcm cluster, all key enzymes related to the biosynthesis of respective unit substrates such as Bht (VcmD, OxyD, Vhp), Hpg (Pdh, HmaS, HmO, HpgT), Dpg (DpgA/B/C/D, HpgT) and vancosamine (VasA/B/C/D/E), heptapeptide assemblage (VcmA/B/C) and post-modifications (OxyA/B/C, Vhal, GtfD/E) were transcribed significantly higher in high yield strain than that of original strain at 24 h and 48 h (Fig. 3, Table S8). It suggested that the increased transcriptional level of norvancomycin's biosynthetic genes (1.9~8.6 fold upregulated at 24 h, and 3.0~18.3 fold upregulated at 48 h) directly promote high yield of norvancomycin in NCPC 2-48. In addition, primary pathways for amino acid (Leu, Asn, Tyr) and glucose production were also upregulated (Fig. 3, Table S9), including genes B37_4517 and B37_7337 for prephenate (the precursor of Hpg); B37_6997, B37_3479 and B37_6785 for Tyr (the precursor of Bht); B37_7779 for Malonyl-CoA (Dpg's precursor); B37_4701, B37_8154 and B37_2225 for Leu; B37_7110 and B37_3479 for Asn and its precursor Asp synthesis; B37_7117 (RfbA) for TDP-D-Glucose (vancosamine precursor). The results suggested that every step of the whole pathway of the biosynthesis of norvancomycin was significantly upregulated in the industrial production strain, from the abundant supply of amino acid and glucose precursors to the NRPS assembling and the post modification of the glycopeptide antibiotic.
AoStrR1 and AoLuxR1 positively regulate the biosynthesis of norvancomycin
The expression of antibiotic biosynthetic genes is usually regulated by cluster-specific regulators within the gene cluster. There are four putative regulatory genes (AoLuxR1, AoStrR1, AoTetR1, AoAraC1) located within or adjacent to the nvcm cluster in the norvancomycin producing strains. The homologues of the four genes also present in a vancomycin producing strain A. orientalis KCTC 9412T (Fig. S2). But in another vancomycin producing strain A. keratiniphila HCCB 10007, only homologues of AoLuxR1, AoStrR1 and AoTetR1 are present (Fig. S2). Differential expression analysis revealed that the regulatory genes AoLuxR1 and AoStrR1 were significantly upregulated in the high-yield strain, with the same trend as the structural genes of nvcm clusters (Fig. 4A, Table S8). The transcriptional levels of these two regulatory genes increased since 24 h and were 23.3-fold and 5.8-fold at 48 h, respectively (Fig. 4A, Table S8). AoTetR1 and AoAraC1 located near the nvcm cluster showed 2~3 fold higher transcription level at 12 h, 24 h and 48 h in the industrial strain than that in the original strain, but in a different trend to the structural genes of nvcm clusters, and the transcription level (Fragments Per Kilobase of exon model per Million mapped fragments, FPKM value) were much lower than that of other genes in nvcm cluster (Fig. 4A, Table S8).
The AoStrR1 gene is located in the nvcm cluster (7,787,412 ~ 7,788,377 nt), with a total length of 966 bp, encoding the AoStrR1 protein of 321 amino acids. Aligning the amino acid sequences of 22 BGCs of known glycopeptide antibiotics from GenBank (Table S10) showed that the homologues of AoStrR1 appeared in almost all searched clusters (Fig. S3A, Table S11), with only exception of corbomycin, the newly discovered compound . These amino acid sequences were used to build phylogenetic tree by MEGA-X program  based on Neighbor-Joining method. Compared with the clades of Tcp28 (from teicoplanin BGC), AoStrR1 tended to group with the regulators encoded in BGCs for vancomycin, balhimycin, chloroeremomycin and A40926 (Fig. S3A). Among them, Bbr and Dbv4, which were confirmed as positive regulators in balhimycin  and A40926  cluster respectively, are highly homologous to the AoStrR1 protein with an identity of 84% and 80%, respectively (Table S11). Thus, we speculated that AoStrR1 might be a pathway-specific regulator of norvancomycin biosynthetic genes, and the increased transcriptional level of AoStrR1 may be an important factor to trigger production of norvancomycin.
The AoLuxR1 gene is close to the right border of nvcm cluster (7,724,319 ~ 7,724,996 nt), with a total length of 678 bp, encoding the AoLuxR1 protein of 225 amino acids. There are 14 possible LuxR-like regulators identified in or adjacent to the 22 glycopeptide BGCs (Fig. S3B, Table S12). The LuxR phylogenetic tree revealed two main clades (Fig. S3B). AoLuxR1 appeared to be related to the regulators encoded in BGCs for vancomycin, decaplanin, keratinimicin and nogabecin, with an identity of 83% ~ 98% (Fig. S3B, Table S12). The other clade included Dbv3 (from A40926 BGC) and Tcp29 (from teicoplanin BGC), which have larger size (more than 500 amino acids) and share low consistency with AoLuxR1 (Fig. S3B, Table S12). These results suggested that the function or regulatory target of AoLuxR1 may be different from well-characterized Dbv3 or Tcp29. Although the AoLuxR1 homologous gene was not found in the well-known balhimycin BGC in A. balhimycina DSM 5908 (there is no adjacent ORF sequences available in GenBank), it is highly conserved in the reported two vancomycin producing strains (Fig. S2, Table S12).
In order to determine the regulatory function of AoStrR1 and AoLuxR1, we constructed AoStrR1 and AoLuxR1 over-expression plasmids based on pULVK2A vector , under its native promoter or ermE*p, a strong constitutive promoter, respectively, and then conjugated into A. orientalis CPCC 200066. Norvancomycin yield in the fermentation broths of the AoStrR1 / AoLuxR1 over-expressing strains were detected by HPLC and LC-MS. Fermentation results showed both AoStrR1 and AoLuxR1 genes could increase norvancomycin production, especially under the ermE*p promoter, which led to a 2~5 times higher norvancomycin yield (Fig. 4BD). To confirm the role of AoStrR1 and AoLuxR1 in transcriptional regulation of nvcm cluster, the gene expression analysis was conducted by reverse transcription quantitative PCR (RT-qPCR) analysis in over-expression strains. As expected, transcripts of the seven biosynthetic enzyme genes for norvancomycin biosynthesis, vcmA, oxyA, oxyB, vph, hmaS, vasA and ald, significantly increased in both over-expression strains (Fig. 4CE). These results indicated that AoStrR1 and AoLuxR1 both acted as activators for norvancomycin biosynthesis.
AoStrR1 binds to eight promoter regions in the nvcm cluster
Cluster-specific regulatory proteins generally activate the transcription by binding to the promoter regions of one or more structural genes within a biosynthetic gene cluster, thereby promoting the biosynthesis of secondary metabolites and ultimately increasing their yield. In order to determine the potential target genes of AoStrR1 and AoLuxR1, we tried to express and purify his-tagged AoStrR1 and AoLuxR1 in E. coli BL21(DE3), and then perform electrophoretic mobility shift analysis (EMSA). Unfortunately, although different expression conditions were conducted, no soluble His-tagged AoLuxR1 was detected in the supernatant of recombinant E. coli BL21(DE3). The soluble AoStrR1 was expressed and purified as a fusion protein with the N-terminal His10-tag in E. coli BL21(DE3) (Fig. 5A). The expressed protein was checked by western blot with anti-His-tag antibody. As shown in Fig. S4, a smaller His-tagged protein band was found in addition to the intact His-tagged AoStrR1 and it is likely that a fraction of AoStrR1 is degraded at the C-terminus of the protein. Since the DNA binding domain was predicted present in the C-terminus of AoStrR1, the degraded protein is supposed not be able to bind promoter regions and thus not affect the formation of binding bands between the intact protein and the DNA fragments in subsequent EMSA experiments.
To identify AoStrR1 target genes, 20 intergenic regions in the norvancomycin cluster responsible for the transcription of almost all structural genes and possible regulator genes were amplified and labeled with biotin as probes (200-500 bp) for EMSA experiments. The results showed that the recombinant His10-AoStrR1 could form a stable complex with the promoter regions upstream of vanY, AoStrR1, oxyA, oxyB, vhp, hmaS, vasA and ald (Fig. 5BD). Addition of 100-fold unlabeled specific competitive DNA attenuated the shift band, indicating that the bindings of His10-AoStrR1 to the above probes are specific (Fig. 5BD). The sequences of above eight AoStrR1 binding sites were input into the GLAM2  and MEME  software and the consensus binding sequence of AoStrR1 was identified as GTCCAN18TTGGAC containing an incomplete palindromic sequence (Fig. 5C). The AoStrR1 binding motif sequence is highly similar to the reported consensus sequence of its homologues Bbr, Dbv4 and Tcp28 (Tei15*) [20, 21, 23]. This indicated that the StrR-like regulators may have a conservative regulatory mechanism in different glycopeptide BGCs.
Based on RT-qPCR and the EMSA results, AoStrR1 might be the ultimate overall positive regulator responsible for norvancomycin production by binding directly to 8 promoters in nvcm cluster, including those of the genes and operons responsible for Bht biosynthesis (Vhp), Hpg biosynthesis (HmaS), vancosamine biosynthesis (VasA), linear heptapeptide cyclization (OxyAB) and self-resistance of norvancomycin (VanY). In addition, AoStrR1 did not bind to the promoters of the other 3 regulators (AoLuxR1, AoTetR1, AoAraC1, Fig. S5) described here but its own promoter. When comparing the regulatory targets of those reported StrR-like regulators in different glycopeptide BGCs, it is interesting that except for Tcp28 (Tei15*), other three regulators (AoStrR1, Bbr and Dbv4) control one common biosynthetic step, heptapeptide cyclization (oxyA), and both AoStrR1 and Bbr could bind to promoter region of genes responsible for specialized amino sugar biosynthesis (vasA / dvaA) in their respective gene clusters. Different from Dbv4 and Tcp28 (Tei15*), Bbr and AoStrR1 are able to bind to its own upstream region. It suggested that the positive feedback mechanism of AoStrR1 might be responsible for the significant upregulation of norvancomycin production in the industrial strain. Moreover, among the four StrR-family regulators discussed above, only AoStrR1 could bind to the upstream of vanY, part of the putative self-resistance genes. Given that self-resistance regulatory system VanRS is lack in nvcm cluster, it is tempting to speculate that AoStrR1 is somehow involved in the regulation of self-resistance.