DOI: https://doi.org/10.21203/rs.3.rs-1637072/v1
Background: Streptomyces are known for their ability to produce a great variety of antibiotics and other bioactive compounds. The production of these molecules is temporally and genetically coordinated with the bacterial morphological changes. These changes are controlled by transcriptional regulators which coincide with antibiotics production. The bldD gene is identified as one of the key players in the complex morphogenesis and activator of antibiotic production in Streptomyces. Besides the laboratory-based experimental works, Genome mining and in silico analysis of transcription start sites, promoter regions, transcription factors, and their binding sites, CpG islands of the bldD gene of antibiotic-producing Streptomyces species are the fundamental steps to understanding the regulatory mechanisms and its impact on the antibiotic production.
Results: Our study identified the most important promoters in the upstream coding regions of the bldD gene of the 13 antibiotic-producing Streptomyces species. All, 13/13 (100%) of bldD genes have a single transcription start site (TSS) flanking the coding regions. The MEME algorithm revealed five motifs (MtS1-5), of which Motif 1 (MtS1) has the lowest E value and the key regulatory motifs for bldD genes among the discovered motifs. Using the TOMTOM web program, we identified 13 transcription factors with a capacity to bind MtS1. The analysis of the CpG Island of the bldD gene of the antibiotic-producing Streptomyces species indicated the presence of lower CpG islands. Phylogenetic analysis identified that bldD genes of antibiotic-producing species considered in this study are very closely related to other groups of Streptomyces.
Conclusions Our study showed that the regulatory elements of bldD genes in antibiotic-producing Streptomyces are located closely upstream of the genes. A detailed understanding of these regulatory elements of the gene that encodes the key activator of antibiotic biosynthesis in Streptomyces species will in enhance the laboratory-based experiments for the production of the antibiotic.
Most of the bioactive ingredients used in medicine, not only today but also in the past, come from natural sources such as microorganisms. The natural products could provide new structures that have biological beneficial properties [1]. The actinomycetes are potential producers of antibiotics and other therapeutically useful compounds [2]. The vast majority of these metabolites (70%) have been isolated from actinomycetes with the remaining 20% from fungi, 7% from Bacillus, and 1–2% from Pseudomonas. Hence, it is known that the actinomycetes are perhaps the most important group of organisms studied extensively for the discovery of drugs and other bioactive metabolites programs [3].
Among the actinomycetes groups; streptomycetes are the major antibiotic-producing organisms utilized by the pharmaceutical industry because they produce over many thousands of bioactive compounds, many of which are secondary metabolites that are strong antibiotics [3,4]. Despite significant development in the disciplines of chemical synthesis and engineered biosynthesis of antibacterial chemicals, nature remains the richest and most versatile source of novel antibiotics at a low cost [2]. Although thousands of antibiotics have been identified to date, only a small proportion of them are useful for people and animals due to their toxicity. To resolve this concern, researchers are looking for novel antibiotics that are both effective and do not have harmful side effects.
Antibiotic resistance is another serious health concern. The need for novel antibiotics is emphasized by the rapid evolution of drug resistance in pathogenic bacteria, particularly multidrug-resistant pathogens [5]. The potential of the genus Streptomyces to produce commercially useful compounds remains critical due to the relatively vast DNA complement of these bacteria [6]. Streptomyces have received more medical and commercial attention for three important reasons: i) they are abundant and prominent in soil; ii) they have a fairly wide phylogenetic distribution; and ii) they are among nature's most capable chemists, producing a remarkable range and diversity of bioactive secondary metabolites [7].
Streptomycetes are known for their ability to produce bioactive ingredients that overlap with the developmental program. Therefore, it is important to understand the signals and mechanisms that trigger them. Extensive genome analyzes revealed the regulators required for development to begin and activation of drug production. These regulators of the growth and activator of active metabolites are known as BldD because mutations in the genes encoding this regulatory factor , bldD prevent the growth of the reproductive aerial hyphae that give colonies their fuzzy appearance and deplete the production of antibiotics [8,9,10].
In many Streptomyces strains, genome sequencing has led to the discovery of multiple potential gene clusters engaged in secondary metabolite synthesis. A large gene cluster comprising a cluster-situated regulator (CSR) expressing gene is commonly used to biosynthesize each antibiotic. Pleiotropic regulators keep track of developmental status, food availability, and a variety of stressors before sending signals to the CSR genes, which control antibiotic production [11]. An antibiotic regulatory network has to be elucidated in order to find new approaches to enhance antibiotic production and arouse cryptic antibiotic synthesis. Various degrees of transcriptional regulators tightly govern the commencement of morphological development, which is often associated with antibiotic production, in response to environmental and physiological changes [12,13,14]. In many situations, one or more cluster-situated regulators (CSRs) control the transcription of structural genes within antibiotic biosynthetic gene clusters. CSRs, in turn, are subject to a complex regulatory network of higher-level authorities [15].
Since recent years, in silico analysis of gene sequences and their products are becoming common methods for identifying gene expression patterns and sequences responsible for the synthesis of novel drugs. This also led to the identification of numerous new medicinal products. A number of computational tools have been developed to assist researchers in this discipline. The majority of tools are based on the in silico study of specific genes and gene products [16]. Therefore, the aim of this computational study was to analysis of promoter regions and regulatory elements of the transcription-regulatory bldD gene from antibiotic-producing Streptomyces species.
Determining the location of the transcription start sites and promoter region in a given gene is vital for the study of the mechanism of gene regulation. The core promoter is a minimum promoter region that is capable of initiating basal transcription. It contains a transcription start site (TSS) and typically spans from −60 to +40 relative to the TSS [17]. In this study, we included ≥1kb upstream of the coding regions to locate the transcription start site (TSS) using the NNPP tools set. The results of our analysis indicated bldD genes 13/13 (100%) from antibiotic-producing Streptomyces species included in this study have only one TSS. The TSS of 11/13 (86.61%) is located less than 100bp upstream of the start codon of the bldD gene (Table 1). This indicated that the transcriptional regulators of the bldD gene in antibiotic-producing Streptomyces species are located closest to the gene's start codon. The BPROM program utilizes a linear discriminant function (LDF) to make a prediction based on the characteristics in the -200 to +50 bps region of the TSS [17,18,19], where higher LDF indicates a high probability of expression of the gene. As a result, 150 bp were included to locate the -10 box positions (highly conserved regions) and -35 box positions (less conserved regions) of the gene. Accordingly, the core promoters of the bldD genes of 58470067, 66853606, 61473082, and [69807580 & 69764486] have an LDF threshold of 2.18, 2.14, 2.04, and 2.03, respectively whilst the core promoter genes of [58431103 & 24306276] have the lowest LDF thresholds of 1.18 and 1.47, respectively.
Table-1
Identified TSS, distance from gene start codon, LDF value determined using the NNPP toolset version 2.2 and BPROM with the minimum standard predictive promoter score and cut off value of 0.8.
Gene Name |
Gene ID |
Chromosome locations |
Number of predicted/TSS |
TSS position |
-10 box at positions |
-35 box at positions |
Linear Discriminant Function (LDF) value |
bldD |
6213046 |
NC_010572 |
1 |
97 |
82 |
61 |
1.99 |
bldD |
24306276 |
NC_013929.1 |
1 |
122 |
107 |
86 |
1.47 |
bldD |
15149186 |
NC_020990.1 |
1 |
122 |
107 |
86 |
1.68 |
bldD |
66853606 |
NZ_CP048261.1 |
1 |
98 |
83 |
62 |
2.14 |
bldD |
63978737 |
NZ_CP070242.1 |
1 |
97 |
82 |
61 |
1.99 |
bldD |
61473082 |
NZ_CP065253.1 |
1 |
89 |
74 |
53 |
2.04 |
bldD |
58431103 |
NZ_KV757141.1 |
1 |
97 |
82 |
61 |
1.18 |
bldD |
58470067 |
NZ_BBQG01000011.1 |
1 |
95 |
80 |
60 |
2.18 |
bldD |
69878388 |
NZ_CP086102.1 |
1 |
97 |
82 |
61 |
1.99 |
bldD |
69863271 |
NZ_CP018074.1 |
1 |
98 |
83 |
62 |
1.87 |
bldD |
69807580 |
NZ_JAGJBY010000001 |
1 |
89 |
74 |
53 |
2.03 |
bldD |
69764486 |
NZ_CP043317.1 |
1 |
87 |
72 |
51 |
2.03 |
bldD |
57807597 |
NZ_JABSUS0100000.1 |
1 |
97 |
82 |
61 |
1.99 |
Using MEME software; conserved motifs for bldD genes of 13 antibiotics producing Streptomyces species were analyzed. For each promoter region, five candidate motifs were identified (Table 2). The presence of common motifs that serve as binding sites for transcription factors that affect the expression of the gene was determined. The motif which has the least E-value (MtS1) has been submitted to the TOMTOM. Our analysis showed that the sequence of the 5’ promoter regions share equal (100%) common motif binding sites. All of the identified motifs equally shared the binding site distributions (100%); however, they showed variation based on statistical expectation value (E-value). Besides, the MtS2, MtS5, MtS1, MtS3, and MtS4 contain 19, 18, 17, 16, 13, and 11 binding site matches motif provided database, respectively.
Table 2
List of discovered motifs, number of promoter-containing motifs, number of binding sites and total number of binding site matches the bldD gene via motif provided in motif database.
Discovered motifs |
Number of promoter containing motifs |
E value |
Motif width |
Number of motifs binding site |
Total number of binding site matches in data base |
MtS1 |
13 (100%) |
1.0e-216 |
50 |
13 |
16 |
MtS2 |
13 (100%) |
1.3e-215 |
50 |
13 |
19 |
MtS3 |
13 (100%) |
7.7e-209 |
50 |
13 |
13 |
MtS4 |
13 (100%) |
8.1e-202 |
50 |
13 |
11 |
MtS5 |
13 (100%) |
1.6e-197 |
50 |
13 |
18 |
In addition, MEME generated thirteen candidate motifs distributed from the position of TSS (+1) to upstream of ≥1 kb. All candidate motifs were distributed in the positive strand with high binding sites. The distributions and the binding site of MtS1 range from -200 to -700 upstream of the transcription start site positions and have high binding sites as well as located closest to the TSS positions. While MtS2 lie in the -500 to -1000 range and they are distant from the TSS positions. Besides, MtS3 lie in the -600 to -1000 range and they are distant from the TSS positions. In addition, 53/65 (81.53%) of the identified motifs were found within the range of +1 to -700. From this study, it is possible to suggest that the transcription regulatory factors BldD bind to the motif closest to the TSS positions and activate antibiotic synthesizing genes (Fig. 1).
Transcription factors (TFs) are essential regulatory patterns that control gene expression. Using TOMTOM, we compared the matching MtS1 with the publicly accessible prokaryotic motif database. The analysis results showed numerous matching motifs between MtS1 and the internationally registered motifs. We identified 11 transcription factors associated with MtS1 which includes Putative DNA-binding protein, integrating host factor subunit alpha, RNA polymerase sigma54 factor, positive regulatory protein of alginate biosynthesis, AraC family transcriptional regulator, nucleoid-associated protein EspR, sigma factor PvdS, macrodomain Ter-Protein, RNA polymerase Sigma 70 family protein (Table 3). The transcription factors play different molecular and biological functions in different groups of organisms. Our study revealed that most transcription factors share a common function in different microorganisms. Notably, the predominant biological function includes DNA-binding transcription activator activity and binding of transcription cis-regulatory region. In addition, positive and negative regulations of transcription and their roles as DNA template are also some of the common feature of transcription factors.
Table-3 List of matching candidate transcription factors (TFs) which could bind to common MtS1 and motif GO terms for motif MtS1 |
||||||||
Organisms Name |
Transcriptions factor/proteins |
Gene Name |
Functions |
E-value |
Gene expressions database |
|||
GO – Molecular function |
GO - Biological processes |
|||||||
Streptomyces coelicolor A3(2)
|
Putative DNA-binding protein |
SCO1489 |
DNA-binding transcription repressor activity, Nucleotide binding, Sequence-specific DNA binding & Transcription cis-regulatory region binding |
Negative regulation of transcription, DNA-templated
|
5.84e-02 |
Collectf/ EXPREG_00000fc0
|
||
Pseudomonas putida (strain ATCC 47054 |
Integration host factor subunit alpha |
ihfA |
DNA-binding transcription activator activity, DNA-binding transcription repressor activity & transcription cis-regulatory region binding |
DNA recombination & Regulation of translation
|
1.20e-01 |
Collectf/ EXPREG_000006f0 |
||
Vibrio cholerae serotype O1 (strain ATCC 39315) |
RNA polymerase sigma-54 factor |
VC_2529 |
DNA binding, DNA-binding transcription activator activity, DNA-directed 5'-3' RNA polymerase activity & sigma factor activity |
DNA-templated transcription and initiation |
2.71e+00 |
Collectf/ EXPREG_000016e0 |
||
Pseudomonas aeruginosa (strain ATCC 15692 |
Positive alginate biosynthesis regulatory protein |
algR |
DNA-binding transcription activator activity, DNA-binding transcription repressor activity, Phosphorelay response regulator activity sequence-specific DNA binding & Transcription cis-regulatory region binding
|
Alginic acid biosynthetic process, Bacterial-type flagellum-dependent swarming motility, Negative regulation of transcription, DNA-templated, Positive regulation of cell motility, Positive regulation of single-species biofilm formation ,Positive regulation of transcription, DNA-templated, Regulation of response to reactive oxygen species Regulation of transcription, DNA-templated & Type IV pilus-dependent motility |
3.44e+00
|
EXPREG_000009d0
|
||
Xanthomonas oryzae pv. oryzae |
AraC family transcriptional regulator |
hrpXo |
DNA-binding transcription activator activity & transcription cis-regulatory region binding |
positive regulation of transcription, DNA-templated |
4.72e+00 |
Collectf/ EXPREG_000017f0 |
||
Streptomyces coelicolor (strain ATCC BAA-471 |
AraC-family transcriptional regulator |
SCO2792 |
DNA-binding transcription factor activity & Sequence-specific DNA binding
|
Positive regulation of transcription, DNA-templated |
4.33e+00 |
Collectf/ EXPREG_00001770 |
||
Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) |
Nucleoid-associated protein EspR |
espR |
DNA binding & Identical protein binding
|
Regulation of protein secretion , Regulation of transcription, DNA-templated & Response to host immune response |
5.32e+00
|
Collectf/ EXPREG_00000c30 |
||
Pseudomonas putida (strain ATCC 47054 |
Integration host factor subunit alpha |
ihfA |
DNA-binding transcription activator activity, DNA-binding transcription repressor activity & Transcription cis-regulatory region binding |
DNA recombination & Regulation of translation
|
8.00e+00 |
Collectf/ EXPREG_000006f0 |
||
Pseudomonas aeruginosa (strain ATCC 15692 |
Sigma factor PvdS |
pvdS |
DNA-binding transcription activator activity, Sigma factor activity& Transcription cis-regulatory region binding
|
Cellular response to iron ion, DNA-templated transcription, initiation, Positive regulation of secondary metabolite biosynthetic process, Positive regulation of transcription, DNA-templated & Regulation of transcription, DNA-templated |
8.14e+00
|
Collectf/ EXPREG_000004b0 |
||
Escherichia coli (strain K12) |
Macrodomain Ter protein |
matP |
Sequence-specific DNA binding |
Cell cycle, Cell division , Chromosome organization, Chromosome segregation & Regulation of transcription, DNA-templated |
8.19e+00 |
Collectf/ EXPREG_000007b0
|
||
Pseudomonas syringae pv. tomato (strain ATCC BAA-871/ DC3000) |
RNA polymerase sigma-70 family protein
|
PSPTO_2133 |
DNA-binding transcription activator activity, sigma factor activity & transcription cis-regulatory region binding |
DNA-templated transcription, initiation, positive regulation of transcription, DNA-templated & response to stimulus
|
8.19e+00 |
Collectf/ |
Transcription Factor Binding Sites (TFBS) are also crucial for understanding gene expression regulations [20]. Thirteen antibiotic-producing Streptomyces species bldD gene promoter sequences were entered into GLAM2 and GLAM2 HTML output was clicked and searched for TFBS. As depicted in Fig.2, 49 nucleotide base pairs TFBS were identified. Alignment was also conducted to check the presence of deletion and insertion among the 13 antibiotic-producing Streptomyces. Consequently, the aligned columns have no deletion or insertion and bldD genes of antibiotic-producing Streptomyces were ungapped. In addition, GLAM2 analysis showed that 8/13 (61.53%) bldD gene indicates a high marginal value, 91.6. These species have a strong motif and better matches to the overall motifs, suggesting a high transcription binding site with the transcription regulatory factor BldD. In contrast, bldD 69863271 showed a lower marginal value, 75.2, suggesting the species has a weak motif and fewer matches to the overall motifs, as well as signifying the lower gene expression level.
Comparisons of the Candidate Motifs to the Database Motifs
The thirteen candidate motifs were compared to the motifs in the motif database (Collectf and EXPREG). Our studies showed that all candidate motifs 13/13 (100%) share the same TFBS with VqsM_P.aeruginosa and RpoN_V.cholerae. Additionally, of the 13 candidate motifs, 9/13 (69.23%) share the same transcription factor binding site (TFBS) with PhoP_Y.pestis, and of these, 52% are activated and 47% are repressed (Table 4). The exemplification logos for the optimal comparison of the IHF P.putida sequence and MtS I of the target motifs with the discovered motif in the database are depicted in (Fig 3). The candidate motif having the same TFBS with the publicly available motif database (Collectf and EXPREG) could suggest that the bldD gene plays a role in the antibiotic production of the Streptomyces species.
Table 4
The candidate motifs in the collectf and EXPREG databases match the sequence enriched motif of with E-values ≤ 10.
Alternate name for the motif provided in the motif database file |
Regulatory Mode |
E-value |
Number of primary sequences matching the motif |
Motif database file
|
|||
Activ. (%) |
Rep. (%) |
Dual (%) |
NS (%) |
||||
PhoP_Y.pestis |
52 |
47 |
0 |
0 |
1.92e-2 |
9 / 13 (69.2%) |
EXPREG_00000050 |
IHF_P.putida |
55 |
45 |
0 |
0 |
6.92e-2 |
8 / 13 (61.5%) |
EXPREG_00000700 |
ArgR_P.aeruginosa |
61 |
27 |
0 |
11 |
1.01e-1 |
11 / 13 (84.6%) |
EXPREG_00000470 |
Fur_V.cholerae |
0 |
100 |
0 |
0 |
1.01e-1 |
11 / 13 (84.6%) |
EXPREG_000008b0 |
ToxT_V.cholerae |
81 |
18 |
0 |
0 |
2.19e-1 |
7 / 13 (53.8%) |
EXPREG_00000240 |
OmpR_Y.pestis |
100 |
0 |
0 |
0 |
2.19e-1 |
7 / 13 (53.8%) |
EXPREG_00001000 |
CcpA_S.suis |
17 |
25 |
0 |
57 |
2.19e-1 |
7 / 13 (53.8%) |
EXPREG_00001810 |
CRP_V.vulnificus |
100 |
0 |
0 |
0 |
3.03e-1 |
12 / 13 (92.3%) |
EXPREG_00001030 |
Fur_N.gonorrhoeae |
60 |
10 |
0 |
30 |
4.05e-1 |
10 / 13 (76.9%) |
EXPREG_00000ec0 |
VqsM_P.aeruginosa |
7 |
0 |
0 |
92 |
4.38e-1 |
13 / 13 (100.0%) |
EXPREG_00001670 |
RpoN_V.cholerae |
100 |
0 |
0 |
0 |
4.38e-1 |
13 / 13 (100.0%) |
EXPREG_000016e0 |
Lrp_E.coli |
1 |
1 |
0 |
97 |
4.54e-1 |
12 / 13 (92.3%) |
EXPREG_00000840 |
Zur_N.meningitidis |
15 |
84 |
0 |
0 |
1.42e0 |
10 / 13 (76.9%) |
EXPREG_000016a0 |
IHF_P.putida |
100 |
0 |
0 |
0 |
1.93e0 |
9 / 13 (69.2%) |
EXPREG_000006f0 |
PvdS_P.aeruginosa |
100 |
0 |
0 |
0 |
3.44e0 |
8 / 13 (61.5%) |
EXPREG_000004b0 |
Fur_A.ferrooxidans |
0 |
63 |
0 |
36 |
4.02e0 |
4 / 13 (30.8%) |
EXPREG_00000370 |
Vfr_P.aeruginosa |
41 |
11 |
0 |
47 |
4.02e0 |
4 / 13 (30.8%) |
EXPREG_00000b50 |
PhhR_P.putida |
90 |
10 |
0 |
0 |
4.02e0 |
4 / 13 (30.8%) |
EXPREG_00001190 |
Fur_P.aeruginosa |
0 |
100 |
0 |
0 |
5.11e0 |
11 / 13 (84.6%) |
EXPREG_00000c80 |
CsgD_E.coli |
33 |
22 |
0 |
44 |
5.81e0 |
9 / 13 (69.2%) |
EXPREG_00000b00 |
LexA_P.difficile |
0 |
0 |
0 |
100 |
6.02e0 |
6 / 13 (46.2%) |
EXPREG_00000120 |
CRP_E.coli |
82 |
17 |
0 |
0 |
6.02e0 |
6 / 13 (46.2%) |
EXPREG_00000850 |
H-NS_V.cholerae |
0 |
100 |
0 |
0 |
6.38e0 |
13 / 13 (100.0%) |
EXPREG_00001730 |
CcpA_C.difficile |
9 |
36 |
0 |
53 |
6.73e0 |
5 / 13 (38.5%) |
EXPREG_00000d10 |
LasR_P.aeruginosa |
98 |
1 |
0 |
0 |
9.24e0 |
3 / 13 (23.1%) |
EXPREG_000009b0 |
OxyR_P.aeruginosa |
3 |
0 |
0 |
96 |
9.24e0 |
3 / 13 (23.1%) |
EXPREG_00001560 |
AdpA_S.coelicolor |
100 |
0 |
0 |
0 |
9.40e0 |
9 / 13 (69.2%) |
EXPREG_00001770 |
Activ: activations, Rep: repression NS: non specified, IHF: integrated host factors, ArgR: arginine responsive regulators, Fur: ferric uptake regulators, OmpR: Outer Membrane Proteins regulators, CcpA: catabolite control protein A, CRP: Cyclic AMP-cAMP receptor protein, VqsM: Virulence and quorum sensing modulator protein, RpoN: RNA polymerase sigma-54 factor, Lrp: leucine-responsive regulatory protein, Zur: Zinc uptake regulator, PvdS: siderophore pyoverdine, Vfr: virulence factor regulator, PhhR: phenylalanine hydroxylase regulators, CsgD: Curlin subunit gene D, H-NS: Histone-Like Nucleoid Structuring Protein, OxyR: oxygen regulators.
Two techniques were used for the analysis of the CpG Island: The first is the offline tool CLC Genomics Workbench Version 8.5, with which the restriction enzyme sites MspI with fragment sizes between 40 and 220 bp were searched for parameters. Accordingly, the result showed that among the 13 Streptomyces species containing bldD genes, only one species 1/13 (7.69%) (GI: 69878388) has a single cleavage site; whereas the remains have multiple cleavage sites (Table 5). And the second algorithm is Takai and Jones, and the possible CpG island regions and CpG island density are shown in (Fig. 4). Our study revealed that only 1 putative CpG Island was detected for each gene sequence. However, the percentage of GC content varies between species. Consequently, the study showed that the GC content of the genes ranges from 68 to 73%. The GC contents of the bldD in antibiotic-producing gene of Streptomyces species (GI; 61473082 & 24306276) were 73 and 68 percent and were the highest and lowest, respectively.
Table-5
Identification of MSpI cutting sites and fragment sizes (40 and 220) for bldD gene of streptomyces species
Gene ID |
Nucleotide positions of MspI enzyme cutting sites |
Fragment size (40-220) |
6213046 |
Multiple cut (29, 64, 78, 115, 243, 402, 434, 443, 467, 686, 819, 849, 876, 890, 1109, 1121, 1205, 1269, 1278, 1309, 1313, 1322, 1350, 1534, 1569, 1588, 1619, 1670, 1676, 1684, 1713, 1826, 1841) |
128,159,219,133,219,84,64,184,51,113 |
24306276 |
Multiple cut (19, 55, 100, 112, 384, 719, 729, 810, 840, 854, 867, 881, 1100, 1112, 1206, 1432, 1523, 1577, 1608, 1618, 1659, 1669, 1673, 1702, 1847, 1976, 1987, 1992) |
45,81,219,94,91,54,145,129 |
15149186 |
Multiple cut (49, 130, 143, 175, 184, 265, 427, 460, 467, 488, 569, 599, 613, 626, 784, 790, 859, 871, 973, 1013, 1035, 1042, 1050, 1057, 1072, 1099, 1136, 1143, 1207, 1282, 1364, 1418, 1432, 1444, 1471, 1500, 1514, 1633, 1645, 1782, 1799, 1804, 1809, 1832, 1861, 1945, 1959, 1987, 1992) |
81,81,162,81,158,69,102,40,64,75,82,54,119,137,84 |
66853606 |
Multiple cut (12, 29, 53, 68, 343, 368, 627, 660, 667, 688, 799, 813, 826, 1059, 1071, 1155, 1212, 1230, 1253, 1262, 1527, 1612, 1663, 1670, 1674, 1700, 1715, 1825) |
111,84,57,85,51 |
63978737 |
Multiple cut (17, 53, 95, 110, 382, 727, 747, 808, 838, 852, 865, 879, 1098, 1205, 1433, 1524, 1578, 1609, 1660, 1670, 1674, 1724, 1977, 1987, 1992) |
42,61,219,107,91,54,51,50 |
61473082 |
Multiple cut (20, 56, 113, 291, 385, 447, 590, 669, 730, 738, 811, 841, 868, 1026, 1032, 1068, 1101, 1113, 1255, 1273, 1278, 1294, 1502, 1556, 1587, 1609, 1638, 1652, 1681, 1794, 1963, 1977) |
57,178,94,62,143,79,61,73,158,142,54,113,169 |
58431103 |
Multiple cut (26, 62, 234, 302, 396, 421, 601, 680, 720, 741, 761, 822, 852, 893, 1112, 1124, 1245, 1302, 1423, 1432, 1514, 1568, 1650, 1696, 1928) |
172,68,94,180,79,40,61,41,219,57,121,82,54,82,46 |
58470067 |
Multiple cut (167, 228, 248, 309, 339, 353, 366, 380, 599, 706, 762, 1024, 1078, 1109, 1160, 1170, 1174, 1203) |
61,61,219,107,56,54,51 |
69878388 |
Single cut (34, 67, 79, 474, 478, 513, 532, 614, 992) |
82 |
69863271 |
Multiple cut (2, 137, 144, 203, 223, 384, 430, 562, 566, 711, 757, 767, 865, 899, 933, 940, 956, 966, 972, 988) |
135,59,161,46,132,145,46,98 |
69807580 |
Multiple cut (15, 88, 109, 149, 244, 576, 660, 690, 704, 731, 875, 950, 1132, 1187, 1207, 1299, 1530, 1585, 1637, 1680, 1712, 1761, 1822, 1882) |
73,40,95,84,144,75,182,55,92,55,52,43,49,61,60 |
69764486 |
Multiple cut (31, 71, 166, 206, 370, 482, 510, 591, 621, 648, 662, 812, 881, 893, 1029, 1101, 1110, 1231, 1322, 1376, 1458, 1504, 1633, 1849) |
40,95,40,164,112,81,150,69,136,72,121,91,54,82,46,129,216 |
57807597 |
Multiple cut (185, 217, 502, 530, 611, 655, 668, 682, 901, 913, 1031, 1065, 1149, 1153, 1183, 1367, 1413, 1448, 1467, 1549, 1563, 1592, 1876, 1881, 1892) |
81,44,219,118,84,184,46,82 |
The nucleotide sequence of the bldD gene from 13 antibiotic-producing Streptomyces species and 20 other related Streptomyces species were combined; aligned and then a family tree was created. Four main criteria were used for the possible way of reading, comparing, and interpreting species relationships and divergences, such as comparison of the distance between branch tips; Number of nodes between species, comparison of time with common ancestors, and several common monophyletic groups. A random anchor (a stretch of 3108 nucleotides) Kitasatospora setae strain KM-6054 23S was used to foot the distance between antibiotic-producing and other Streptomyces. A combined analysis of the data yielded a single significant cladogram, obtaining ten clusters and two clades. Consequently, our analysis revealed that the Streptomyces species containing bldD genes that produce antibiotics fall into clusters II, IV, V, VI, VII and cluster X (Fig. 5).
The genus Streptomyces is considered to be an important source of bioactive compounds. Several regulatory proteins play a critical role in the activation of an antibiotic biosynthetic gene, of which the BldD functions at the top of the regulatory cascade that controls Streptomyces development and activation of antibiotic production [21]. In additions, several laboratory-based in vitro studies have indicated the significant role of BldD in the regulation of antibiotic production [22]. In Streptomyces coelicolor for example, BldD is a transcriptional regulator required for morphological development and antibiotic synthesis [22] .The in silico-based analysis of the transcription regulator elements of the bldD gene in the antibiotic-producing Streptomyces species could enhance a better understanding of the drug development and facilitate the implementation of laboratory-based in vitro experiments.
In this study, using the NNPP and BPROM web-based programs; we identified the promoter region and TSS that are closely located upstream of the coding regions of the bldD gene of antibiotic-producing Streptomyces species. Our study also identified that bldD gene from all antibiotic producing Streptomyces considered in this study has a single TSS located closer to the start codon of bldD genes, suggesting that the genes are expressed from a single TSS. The availability of the TSSs in close vicinity of the protein-coding region found in this study is consistent with the one reported by Lee et al.(2022) [23]; who showed that most TSSs in Streptomyces were located within 5–100 bp upstream of the start codon. This suggests that the bldD gene in antibiotic-producing Streptomyces regulates the expression of the target gene through the closely located regulators. Our result is also comparable to the study conducted by Jeong et al. (2016 )[24]; who revealed that a total of 68 TSSs mapped to 18 of the 28 secondary metabolic gene clusters identified in the S. coelicolor genome, where they identified an average of 1 TSS for every 2.3 protein-coding genes. TSSs located from 500 bp upstream to 150 bp downstream of the respective annotated start codon of each ORF have been classified as primary (P) or secondary (S) TSS [23]. The regulation of gene expression at the transcriptional level is a fundamental process found in all biological systems [25,26].
Transcription factors are proteins that bind to DNA-regulatory sequences (enhancers and silencers), usually localized in the 5-upstream region of target genes, to modulate the rate of gene transcription. This may result in increased or decreased gene transcription, protein synthesis, and subsequent altered cellular function [27]. In the promoter regions of transcription units, TFs attach to short DNA sequence motifs commonly known as binding sites. Position-specific scoring matrices (PSSMs) can be used to represent all different binding sites recognized by the same TF as a single consensus sequence. The probability of obtaining a particular nucleotide at a particular site is represented by such matrices, which can be represented using a logo representation [20]. Bacterial transcriptional regulators are classified into ~ 50 families on the basis of sequence alignment and structural and functional criteria [26]. Our study showed that five significant motifs for each bldD gene of Streptomyces species in the promoter sequence regions. The existence of common motifs acting as TFBS has also been identified. As a result, Motif 1 (MtS1) has the lowest E value and key regulatory motifs for bldD genes among the five motifs discovered.
The comparative analysis of motif (MtS1) with the known Prokaryotic motif databases showed the matching of the MtS1 with 11 TFs including Putative DNA-binding protein, integrating host factor subunit alpha, RNA polymerase sigma 54 factor, positive regulatory protein of alginate biosynthesis, AraC family transcription regulator, nucleoid-associated protein EspR, sigma factor PvdS, macrodomain Ter protein and RNA polymerase Sigma 70 family protein are some of the transcription factors associated with motif one. The functional role of these TFs in the bacteria includes metabolism, virulence and pathogenesis, replication and regulation of several transcriptional processes (Table 3). Analogous results were also reported in other bacterial species [10,9,26], suggesting the conserved nature of the TFs across the prokaryotes. The identification of the TFs associated with the metabolism and regulator suggest the potential role of identified motif in the regulation and activation of secondary metabolite in the Streptomyces.
Consolidating our results, Fang et al. (2018) revealed a novel AraC-family transcriptional regulator, SAV742, is a global regulator that negatively controls avermectin biosynthesis and cell growth, but positively controls morphological differentiation [28]. Avermectins are useful anthelmintic antibiotics produced by Streptomyces avermitilis. In addition, AraC family members was reported as one of the key transcriptional factors in Streptomyces playing a role in the control of genes involved in important biological processes such as carbon source utilization, morphological differentiation, secondary metabolism, pathogenesis and stress responses [26]. Fang et al. (2018) also identified the regulatory role of the AraC-family transcriptional regulator BfvR (YPO1737 in strain CO92) in biofilm formation and virulence of Yersinia pestis biovar Microtus [29].
Recently, nucleoid-associated proteins have also been found to influence the expression of specialized metabolic clusters [30,31]. Leucine-responsive regulatory protein2 (Lsr2) is a small nucleoid-associated protein found throughout the actinobacteria having similar role to the well-studied Histone-like nucleoid structuring protein (H-NS), in that it preferentially binds AT-rich sequences and represses gene expression [31,32,33]. In Streptomyces venezuelae, Lsr2 represses the expression of many specialized metabolic clusters, including the chloramphenicol antibiotic biosynthetic gene cluster, and deleting lsr2 leads to significant upregulation of chloramphenicol cluster expression [32]. Bacteria including Streptomyces are also known to utilize protein ADP-ribosylation. Lalić et al. (2016) determined the crystal structure and characterized both biochemically and functionally the macrodomain protein SCO6735 from Streptomyces coelicolor that possesses the ability to hydrolyze PARP-dependent protein ADP-ribosylation [34]. The expression of this protein is induced upon DNA damage and that deletion of this protein in S. coelicolor increases antibiotic production.
The cis-regulatory element like the CpG islands has also been examined and identified since the bldD gene contains CpG islands. If a CpG island is present within the 5 kbps sequence of a promoter, it is classified as CpG rich, otherwise as CpG poor. Therefore, in our case the CpG Island in each bldD gene of streptomyces has been appeared at ≥ 2kbps upstream of the coding regions. Therefore, the proportion of CpG-rich promoters is higher in our study. It is generally accepted that promoter regions correlate with CpG islands. CpG islands are regions of DNA longer than 200 bps with a G + C content of at least 50% and a number of CpG dinucleotides that is at least 60% of that which is due to the G + C content would be expected [18]. The principal difference between CpG island and non-CpG island promoters is how their transcription is repressed or modulated. Non-CpG island promoters maintain transcription repression by cytosine methylation at CpG dinucleotides [35]. Methylation of DNA is thought to regulate transcription both directly and indirectly. CpG methylation can directly repress transcription by preventing binding of some transcription factors (TFs) to their recognition motifs [36]. Interestingly, our result showed that a single CpG islands were detected for each kind of bldD gene and suggested the expression of the gene transcriptions is strongly expressed. Thus, the bldD gene has the potential to produce important antibiotics. This is, because, the gene has CpG islands and less repression. The proportion of GC contents, on the other hand, differed between bldD gene antibiotics producing streptomyces species. In addition, the study found that the GC content of the species ranged from 68 to 73%. In addition, Streptomyces species containing the bldD gene are very closely related to other groups of Streptomyces. As a result, not only the Streptomyces species containing the bldD gene, but also the other groups of Streptomyces may have the potential to produce important antibiotics (Fig. 5).
Gene mining and in silico analysis of gene sequences are very important for predicting gene expression patterns. In addition, it is important to identify the gene responsible for the synthesis of new drugs. Our study revealed that the bldD genes have TSS and promoter regions in close vicinity to the gene start codon. In addition, our study also identified matching TFs with the MtS1, one of the key motif identified in this study. The phylogenetic analysis of the bldD, in the antibiotic producing and other Streptomyces species indicated the close relatedness of this species, suggesting their evolutionary familiarity. Therefore, this computational study can serve as a basis to undertake laboratory based experiments to produce essential antibiotics.
Promoter regions are intrinsic DNA elements located upstream of genes and required for their transcription by RNA polymerase (RNAP). Some of the first approaches to mapping promoters were based on using position weight matrices (PWMs) of -10 and -35 box motifs, taking into account the distribution of spacer length between motifs and their distance from TSSs [37]. Correct identification of promoters is a crucial step in studying gene expression in bacteria. Here, we consider promoters as the core elements recognized by the sigma subunit of RNAP. This sigma factor recognizes an approximately -35 bp consensus region with two key elements, the 10-box (with the consensus motif TATAAT) and the -35 box (TTGACA) separated by 17±2 bp [38]. Besides the core promoter region, other cis-regulatory elements may play a relevant role in regulating gene expression [38]. In this study, the sequences of bldD genes from 13 Streptomyces species were retrieved in February 2022 from the National Center for Biotechnology Information, NCBI (http://www.ncbi.nlm.nih.gov Nucleotide Database) [39]. To determine promoter positions/TSS, ≥1Kb nucleotide base pairs upstream of the coding regions of the bldD gene were entered into the NNPP version 2.2 toolset and ≥150bp were entered to BPROM. All TSSs of each species gene were screened using the NNPP toolset and BPROM algorithms [18].
Analysis of the common motifs for bldD genes was conducted using Multiple Em for Motif Elicitation (MEME) software version 3.5.4 (http://meme-suite.org/tools/meme) http://meme.sdsc.edu) using the, sequence ≥1kb upstream of the promoter positions or TSS [37,40]. The MEME usually finds the most statistically significant (low E-value) motifs and the E-value of a motif is based on its log likelihood ratio, width, sites, the background letter frequencies, and the size of the training set. Then, the search results page was linked to the MEME output in HTML format and the smallest expected value (E-value) was considered for further analysis. The MEME output for each theme was forwarded with a button to send that theme directly to TOMTOM, web-based searching motif comparison programs against a database of known motifs [41]. For this analysis, CollecTF (Bacterial TF motifs) and EXPREG were used as reference database binding motifs. The rank of the primary sequences was compared to all ‘ab initio’ motifs discovered by Sequence Enrichment Analysis (SEA) and the enrichment p-values were used to determine the motifs rank. As a result, the parameters for detecting motif site distribution are set to zero or one site per sequence (ZOOPS), the maximum number of motifs is 13, the motif E-value threshold is unlimited, the minimum motif width is 5, the maximum motif width was 50 [42].
Sequence motifs are important tools in molecular biology and can describe identify features in DNA, RNA, and protein sequences such as transcription factor binding sites, splice sites, and protein-protein interaction sites [43]. Several algorithms have been developed to discover motifs, as well as algorithms for searching databases for matches to a given motif or motifs. Gapped Local Alignment of Motifs (GLAM) is one of the specialized algorithms for DNA motif discovery and also important for identifying functional site motifs. As a result, thirteen antibiotic-producing Streptomyces species bldD gene promoter sequences were entered into GLAM2 and clicked on GLAM2 HTML output and searched for transcription factors binding site (TFBS) [43].
The CpG island of the genes was determined using two algorithms : the first algorithm was the offline tool CLC Genomics Workbench Version 8.5 (https://clc-genomics workbench.software.informer.com/8.5/) which is used for searching the restriction enzyme MspI cutting sites (with fragment sizes between 40 and 220 bp parameters), and the second algorithm is the Kuo et al. algorithm http://dbcat.cgm.ntu.edu.tw/); which have search criteria of GC content greater than or equal to 55 percent, Observed CpG/Expected CpG ratio 0.65 [44].
In addition to the 13 bldD gene, of the 13 antibiotic producing Streptomyces species considered in this study, others 20 related Streptomyces were collected by considering the lowest E-values and then aligned using Muscle Multiple Alignment Tools. The phylogenetic tree was constructed through UPGMA methods in MEGA 6.0 platform using aligned sequences from prokaryotes [45,46]. With the help of significant aligned sequences from prokaryotes, the phylogenetic relationship of Streptomyces species contains the bldD genes was inferred. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) has been shown next to the branches. The evolutionary distances were computed using the Maximum Composite Likelihood Method and are in the units of the number of base substitutions per site. This analysis involved 34 nucleotide sequences including the random anchor (a stretch of 3108 nucleotides) Kitasatospora setae strain KM-6054 23S. All positions containing gaps and missing data were eliminated (complete deletion option).
Availability of Data and Materials
The datasets generated and/or analysed during the current study are available in the International Nucleotide Sequence Database Collaboration (INSDC) member; NCBI repository.
https://www.ncbi.nlm.nih.gov/gene/?term=bldD+transcriptional+regulator+BldD.
The anchor datasets generated and/or analysed during the current study are available in the International Nucleotide Sequence Database Collaboration (INSDC) member; NCBI repository; with accession number NC_016109.1
bldD |
Bald Gene |
BRPOM |
Bacterial Promoter |
CpG |
Cytosine-Phosphate-Guanine |
CSRs |
Cluster-Situated Regulators |
GLAM2 |
Gapped Local Alignment Motif 2 |
NCBI |
National Center Of Biotechnology Information |
NNPP |
Neural Network Promoter Prediction |
TFBS |
Transcriptions Factor Binding Site |
TF |
Transcription Factors |
TSS |
Transcriptions Start Site |
Acknowledgement
The authors would like to acknowledge Adama Science and Technology University, Applied Biology Department.
Funding
Not applicable
Affiliations
Department of Applied Biology, Institute of Pharmaceutical Science, Adama Science and Technology University, Adama, Ethiopia
Sisay Demisie & Ketema Tafess
Contributions
SD and KT comprehended and designed the research plans and KT supervised the manuscript. SD drafted the manuscript and did the computational study. And both performed the computational data analysis and revised the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Correspondence to Sisay Demisie
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.