Abd-El-Haleem et al., [6] discovered the phenol-degrading bacterium Acientobacter sp. Strain DF4 in Egyptian industrial wastewater, showcasing its bioremediation potential. Their prior work [6] involved a bioluminescent reporter system using a phenol-inducible mopR-like promoter segment from DF4. Interestingly, clone DF4-8 displayed bioluminescence in response to phenol, while clone DF4-11, sharing the same promoter, did not exhibit this activity may be due to sequence differences. To elucidate this discrepancy, DF4-11 underwent in-depth TA cloning, PCR amplification, and DNA sequencing analyses. The subsequent genomic and proteomic scrutiny aimed to unravel the genetic underpinnings behind clone DF4-11's inactivity and sequence divergence. The exploration of these variations not only sheds light on bioluminescence disparities but also unveils potential genetic elements with broader biotechnological implications, offering avenues for innovative biotechnology applications in diagnostics, therapeutics, and pathway engineering.
The analysis of taxonomic distribution revealed a notable bias towards the phylum Pseudomonadota concerning ORF1 homologs, indicating a deeply conserved function for the encoded protein within this bacterial group. The dominance of Gammaproteobacteria, especially the family Pseudomonadaceae and the genus Acinetobacter further supports this idea, pointing towards a shared evolutionary origin for the function of ORF1. The BLASTAlignPeptide search identified Acinetobacter calcoaceticus (strain PHEA-2) as the closest homolog to ORF1, showing a perfect match over a significant portion of the amino acid sequence, which suggests not only a close functional and structural relationship but also strengthens the argument for ORF1 encoding a urocanate hydratase. Urocanate hydratases, crucial enzymes in the histidine degradation pathway, are distributed widely across diverse organisms. Research has reported urocanate hydratase activity in various bacterial species such as Pseudomonas putida [19], Mycobacterium smegmatis [20], and Bacillus subtilis [21]. Additionally, fungi like Serratia marcescens also exhibit urocanate hydratase activity [22]. The taxonomic range extends to mammalian tissues, including rat liver [23] and rabbit liver [24], emphasizing the importance of the histidine degradation pathway across different taxa.
Despite ORF1 exhibiting moderate sequence similarity to established Octopus urocanate hydratases, the Rfam mRNA search outcomes indicate a conserved function. The presence of remarkably low E-values and thorough sequence coverage between the query and target sequences suggests a potential evolutionary link between ORF1 and urocanate hydratases from Stegodyphus dumicola, potentially closer than with Octopus species. The recognition of ORF1 as a plausible urocanate hydratase in Stegodyphus dumicola and Octopus bimaculoides implies the conservation of this pathway in these invertebrates. The Rfam database serves as a crucial tool for investigating non-coding RNA (ncRNA) families, facilitating detailed RNA structure annotations and cross-species conservation analysis. Through a blend of manual curation and computational techniques, Rfam streamlines ncRNA annotation in genomic data, aiding in the identification of new ncRNA candidates and enhancing our understanding of ncRNA regulatory functions in biological processes. This platform expedites the exploration of novel RNA families, enriching our insights into ncRNA roles within biological systems [25, 26, 27].
The in-depth exploration of specific domains within ORF1 unveils potential functional and evolutionary implications. The N-terminal domain (IPR035400), crucial for urocanase structure and function, likely mirrors its role in ORF1, while the Rossmann-like domain (IPR035085) hints at a catalytic mechanism for ORF1's putative urocanase activity. Comparisons with established urocanases, such as CATH domain 1uwkA01, offer evolutionary insights. Further structural and phylogenetic analyses could elucidate ORF1's lineage within the urocanase family, shedding light on ancestral links and adaptations [27, 28]. Research efforts, like the study by Sheila Boreiko et al. [29] on urocanate hydratase from Trypanosoma cruzi, provide critical structural insights for potential therapeutic interventions. Studies by Kessler et al. [28] and Lenz and Rétey [30] have deepened our understanding of urocanase's architecture and catalytic processes. Additionally, investigations such as Egan et al.'s [31] analysis of urocanase mechanisms through deuterium isotope effects contribute to deciphering the enzyme's catalytic activity and substrate interactions, enriching our comprehension of urocanase function and evolution.
The structural analysis of the ORF1 protein predicts a significant portion of alpha helices (32.60%) and beta strands (21.55%), classifying it as an alpha-beta protein, a class associated with varied functions like enzymatic activity, protein-protein interactions, and signal transduction [32]. The high coil content (45.86%) indicates flexible regions crucial for protein dynamics and conformational changes during function. With a substantial surface exposure of 46.96%, the ORF1 protein shows potential for interactions with other molecules, suggesting functional versatility due to its diverse amino acid composition. The presence of hydrophobic residues (Leu − 10.5%, Val − 8.3%) likely contributes to core stability, while polar residues (Thr − 7.2%, Gln − 6.6%, Asn − 5.5%) could engage in hydrogen bonding and interactions with solvents or polar molecules. Charged residues (Asp − 5.0%, Lys − 5.5%) may play roles in protein-protein interactions, enzymatic activity, or pH-dependent functions [33]. The absence of predicted disulfide bonds implies that ORF1 may not depend on these bonds for structural stability. However, considering the prediction confidence level (7%) and the potential for alternative disulfide bond arrangements not captured, further investigations using techniques such as mutational analysis or advanced structural determination methods like X-ray crystallography or cryo-electron microscopy are warranted to elucidate the presence and significance of disulfide bonds [33, 34].
The homology modeling approach successfully generated a high-quality model of the ORF1 protein with a predicted homo-dimeric structure. The high sequence identity (87.71%) between the chosen template (1uwk.1.A) and the ORF1 sequence provides confidence in the model's accuracy. This is further supported by the reliable model quality assessment scores (GMQE: 0.88, QMEANDisCo: 0.78 ± 0.05) [35, 36]. The identification of two conserved binding sites within the predicted ORF1 structure sheds light on its potential functional mechanisms. The first ligand, NAD+1, likely interact non-covalently with a network of residues on chain A of the model. This suggests NAD+1 might play a crucial role in the catalytic activity of ORF1, potentially acting as a cofactor [37].
The second ligand, URO4 (Urocanate), exhibits a more intricate binding mode. The presence of a single hydrophobic interaction with I.145 alongside multiple hydrogen bonds (including A:G.54, A:Q.131, and A:G.177) and water bridges (involving Y.52 and G.54) suggests a more specific and energetically favorable interaction between URO4 and the ORF1 binding pocket. These interactions potentially position URO4 for its role as a substrate within the enzymatic process catalyzed by ORF1 [31]. The observed homo-dimeric structure aligns with the findings for several templates (1uwk.1.A, 1uwl.1.A, etc.), suggesting that dimerization might be essential for ORF1 function. The presence of two separate binding sites, one for each monomer within the dimer, could indicate cooperative binding or sequential steps within the catalytic cycle [28]. Additionally, through a comparative analysis of binding interactions involving known ligands with those anticipated for ORF1, UniProtKB findings indicate potential NAD+ binding sites at positions 53–54 (GG), 129 (Q), and 177–178 (GM) in compare to predicted UniProtKB model F0KI08 hutU, BDGL_002875from Acinetobacter calcoaceticus (strain PHEA-2). This was reinforced by Amino Acid Conservation Analysis of the ORF1 Protein through the ConSurf server (Fig. 4C), accentuating those sites 53–54, 129, and 177–178 emerge as the most highly preserved segments within the ORF1 sequence.
Phyre2's analysis of ORF1's secondary structure sheds light on its potential function. The presence of significant pockets, particularly the one harboring Ser104, Leu106, Ala111-Tyr127, suggests their critical role in substrate binding or catalysis. Ser104's hydroxyl group likely forms hydrogen bonds, essential for anchoring the substrate or stabilizing intermediates during enzymatic reactions [38]. Neighboring Leu106 might influence the pocket's hydrophobicity, interacting with aromatic residues like Phe116 and Tyr127. This interplay could shape a unique cavity that governs substrate orientation within the pocket [39]. Interestingly, these findings regarding ORF1's active site align with established knowledge about urocanase enzymes. Urocanase share a conserved active site motif (GXGX2GX10G) indicative of the Rossmann fold, known for NAD + binding, and two conserved cysteine residues [17]. In ORF1, the predicted binding site for NAD + 1 and the presence of strategically positioned and conserved residues within the pocket are reminiscent of this signature urocanase active site. Taken together, the predicted homo-dimeric structure of ORF1, the identification of distinct binding sites for potential ligands (NAD + 1 and URO4), and the presence of pockets with strategically positioned and conserved residues strongly suggest a functional active site within ORF1.
ConSurf's conservation analysis pinpointed a highly conserved region (residues 101–148) that remarkably overlaps with the pocket identified through secondary structure prediction (residues 104–127). This remarkable overlap strengthens the argument that this pocket plays a critical role in ORF1's function, potentially acting as a substrate binding or catalytic site. Motif analysis using MEME and MAST software further bolsters this argument. A key motif, NWECFD/NWEHFD, was identified within ORF1 and exhibited significant alignment with the signature motif found in urocanate hydratase enzymes. This finding was further substantiated by a BLASTp search revealing 100% urocanate hydratase identity for the NWECFD motif across various organisms, including bacteria, archaea, and eukaryotes. Additionally, the predicted binding site for NAD + 1 in the ORF1 structure aligns with the established role of NAD + as a cofactor in urocanate hydratase activity [40].
Taken together, these observations strongly suggest the presence of a functional active site within ORF1, potentially possessing urocanate hydratase activity and utilizing NAD + as a cofactor. While the NWECFD motif points towards a potential role in urocanate metabolism, the presence of the NWEHFN motif adds another layer of complexity. Although a BLASTp search on the NWEHFD motif identified a broader range of proteins with diverse functions, its conservation within ORF1 suggests a functional role. It's possible that this motif interacts with other protein regions or participates in a broader enzymatic mechanism alongside the putative urocanate hydratase activity indicated by the NWECFD motif [41].
Building upon the identification of a putative active site with urocanate hydratase activity, PROSCAN analysis revealed potential regulatory mechanisms for ORF1. Predicted phosphorylation sites for Protein Kinase C and Casein Kinase II, alongside N-myristoylation sites, suggest PTM-mediated control of ORF1 function beyond its active site. These PTMs could modulate enzymatic activity, stability, localization, or interaction with other proteins, highlighting the potential for multi-layered regulation of ORF1 within the cell [42, 43].
The discovery of putative binding sites for regulators involved in nitrogen assimilation, including the Hut Nitrogen Assimilatory Cofactor binding site, σN promoter element, σ54 consensus sequence, and σ54 global nitrogen control element, points towards a potential role for ORF1 in nitrogen metabolism [17]. Furthermore, the detection of a putative RBS motif, -35 and − 10 elements, and an AT-rich region strengthens the understanding of the translational machinery's interaction with the ORF1 mRNA. These elements likely facilitate ribosome binding, RNA polymerase interaction, and subsequent ORF1 gene expression. One intriguing observation is the co-localization of the NWECFD and NWEHFN motifs with the − 35 elements of the predicted promoter regions (DNA positions 177–195 and 333–347). This spatial arrangement presents two possible interpretations.
Firstly, the NWECFD and NWEHFN motifs might directly influence promoter activity. They could potentially modulate the binding or affinity of RNA polymerase to the promoter regions, thereby regulating ORF1 expression. Notably, the NWECFD motif's association with urocanate hydratase activity suggests a potential link to urocanate metabolism, which could be relevant in the context of the identified HutC binding site. Secondly, the overlapping locations might be coincidental, with the motifs serving independent functions unrelated to promoter activity. The NWECFD motif's association with urocanate hydratase activity and the diverse functionalities of NWEHFN identified by the MAST search support this possibility. Interestingly, the MAST search also found matches for NWEHFN in ECF RNA polymerase sigma factors, suggesting a potential connection to nitrogen metabolism.