Sequence analysis
Essential genes and proteins are necessary for existence of an organism and finding them would contribute to a greater grasps of life’s principle. Total ten complete genome sequences of different strains from OT are present in GenBank data source. As the core gene set of family Rickettsia is highly conserved between Boryong and Ikeda strain, these two complete sequences were selected for the analysis. The general over view and assembly statistics of OT strains were shown in Table 1. Using multiple functional annotation servers, all of the hypothetical protein sequences were retrieved for biological functional assignment. Pseudo genes (proteins with fewer than 100 amino acid residues) were eliminated from this study to reduce the number of misinterpretations in the functional annotation pathway. Figure 1 depicts the whole work plan, which includes all bioinformatics tools used. The major selection criteria were the protein must contain more than 100 amino acid residues; protein should be stable and must be virulent. Total292and 333 HPs were predicted from Ikeda and Boryong respectively from where 36 from Ikeda and 27 from Boryong were fulfill all the criteria and the flowchart is shown (Fig. 1).
The physicochemical characterizations of all 175 and 210 HPs were predicted among which 82 and 71 proteins are stable in Ikeda and Boryong respectively and the values were listed in Table 2. In strain Ikeda the molecular weight ranged from 11.022 kDa to 262.975 kDa. The PI ranged from 3.88 to 10.47. The isoelectric point is the maximum pH at which the amino acid of a protein tolerates zero net charge and don’t move in a direct current electric field. The GRAVY value ranged from − 1.098 to 0.845 of which 30.76% have positive index, being assigned as hydrophobic. Smaller the GRAVY value more hydrophilic in protein. At 280 nm, the extinction co-efficient of HPs ranged from 1490-142605 computed by Expasy’s Protparam tool. The presence of high concentration of Cys, Trp, and Tyr indicates a higher extinction coefficient of HPs. The quantitative study of protein-protein and protein-ligand interactions in solution can be done by using this computed extinction coefficients. The instability index value of the HP was found to be ranging from 10.33–39.99. It was predicted that a protein would be stable with instability index less than 40.. The PI is the pH at which the amino acid of protein tolerates no net charge and hence does not move in a direct current electric field.
The predicted sub cellular localization was given (Supplementary Table T1). In Ikeda out of 35 hypothetical proteins, cytoplasmic (62.85% i.e 22), cytoplasmic membrane (11.42% i.e 4), outer membrane (8.57% i.e 3), inner membrane (71% i.e 2), extracellular (8.57% i.e 3) and periplasmic protein (2.85% i.e 1) were predicted. Similar to Ikeda, Boryong have 27 HPs among which 14 (51.85%) were cytoplasmic, 3 (11.11%) are cytoplasmic membrane, 6 (22.22%) are outer membrane, 1 (3.7%) inner membrane and 3 (11.11%) extracellular proteins. In addition to in-silico analysis revealed that 17and 21 HPs were soluble proteins, from which 7and 16 having transmembrane region in Ikeda and Boryong respectively. To distinguish whether the hypothetical and conserved proteins used in this study were soluble or transmembrane proteins, the SOSUI server was used. Four characteristics of the amino acid sequence were applied by the SOSUI predictions: hydropathy index, amphiphilicity index, amino acid sequence charge, and amino acid sequence length. Membrane proteins have a hydropathy profile and at least one transmembrane helix. The maximum amount of transmembrane helices found in Ikeda was 2 in the protein OTT_0001 i.e “IFLVIIVAIINFITVANATASIC” and “LFKALWIFGIIIVLSVLAIKTTI”. In Boryong OTBS_0978, OTBS_2105 and OTBS_0302 contained 3 each maximum transmembrane helices. The number of transmembrane domains for each protein and their N-and C-terminals are shown (Table 3).
Functional Annotation
Generally, proteins are composed of one or more functional and/or structural regions, commonly termed domains. The conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. To study the functional analysis conserved domains of the present study were observed as follows (Table 4).
In our study we used 14 HPs but found only 1 protein OTT_0680 possessing specific domains i.e. bacterial SH3 domain and classified as SH3 domain super-family accordingly with a clan description of Src homology-3 domain. The other proteins showed insignificant domains. The functions of 66 HPs were assigned with high confidence and classified them into thirteen types. Understanding the molecular basis of pathogenesis and host-pathogen interaction requires a thorough understanding of these enzymes. Many HPs possess enzymatic activities (categorized as phosphatase, transferase, hydrolase, kinase, peptidase, dehydrogenase, isomerase, nuclease, lyase, ligase, oxidase and endonuclease), binding proteins, transporter, secretory and regulatory proteins. Here, we provide a detailed analysis of each group of proteins given (Table 4, Fig. 2).
Phosphatase
Decrease in phosphate concentration of host may lead to increase in virulence of pathogens. Phosphatase enzymes secreted by these pathogens lead to depletion of phosphate level in the local environment of the infection sites to enhance their pathogenicity. Protein OTT_0636, OTBS_1498 have protein tyrosine phosphatase activity as molecular function and peptidyl-tyrosine dephosphorylation as a biological process activity.
Transferase
A transferase is a class of enzyme that performs the transfer of specific functional groups (e.g. a methyl or glycosyl group) from one molecule to another. A total of 11 hypothetical proteins (HPs) OTT_1526, OTT_0512, OTT_0378, OTT_0268, OTBS_0254, OTBS_0398, OTBS_1016, OTBS_0282, OTBS_1311, OTBS_0926, OTBS_0302 having molecular functions having transferase activity, that help to transfer acyl groups and cause RNA phosphodiester bond hydrolysis,. Bacterial interactions with the host mostly depend on the bacterial glycome. Particularly the bacterial glycome is largely determined by glycosyltransferases (GTs).
Hydrolase
A hydrolase is an enzyme that catalyses the hydrolysis of a chemical bond. The genomes of both gram negative and gram positive bacteria encode a wide variety of hydrolase enzymes, responsible for the specific cleavage of different peptidoglycan bonds. Hydrolase are also involved in many other functions, such as peptidoglycan maturation, turnover, recycling, autolysis, and cleavage of the septum during cell division. Four hypothetical proteins, OTT_0630, OTT_1111, OTBS_1492, OTBS_0057, showed hydrolase activity that catalyzes the hydrolysis of a chemical bond. The genomes of gram-negative and gram-positive bacteria species l encode an inclusive variability of hydrolase enzymes that accounts for the specific cleavage of different peptidoglycan (PG) bonds. The hydrolase are involved in several critical functions, including PG maturation, turnover, recycling, autolysis, and cleavage of the septum during cell division (Lee et al. 2013).
Kinase
Kinases play an essential role in cell cycle regulation, filamentous growth and signal transduction. We identified four proteins that have kinase activity including OTT_0078, OTT_1311, OTT_1110, and OTBS_1340. It has been seen that bacteria has a protein serine/threonine kinase activity, transferase activity, and dephosphorylation activity where significant posttranslational changes of native proteins occurs on the protein surface. These biological mechanisms play critical roles in intracellular signal transduction cascades and enzymatic activity switching. Cell proliferation, programmed cell death (apoptosis), cell differentiation, and embryonic development are all regulated by Serine/Threonine Kinase receptors.
Peptidase
Peptidase, an enzyme that cleaves small peptides, typically inactivating them. They accomplish this by hydrolyzing peptide bonds inside the protein. They are found on the surface of many distinct cell types, with the catalytic site exposed mainly outside. Endopeptidase is the proteolytic peptidases that break the peptide bond between non-terminal amino acids as opposed to exopeptidase, which breaks peptide bond between terminal amino acids based on the functional groups in the active site. OTT_0186 and OTBS_2105 are cysteine-type endopeptidases, whereas OTBS_0726 and OTBS_2058 are signal peptidases. OTT_0896 has aspartic-type endopeptidase activity.
Isomerase
One Hp i.e. OTT_0464 was identified as isomerase in this study. Alanine racemase isomerizes L-alanine to D-alanine, which is essential for cell wall production (peptidoglycan biosynthesis).The alanine racemase monomer is composed of two domains, an eight stranded alpha/beta barrel at the N terminus, and a C-terminal domain essentially composed of beta-strands that plays a role in providing D-alanine required for cell wall biosynthesis (peptidoglycan biosynthesis) by isomerising L-alanine to D-alanine.
Ligase
Two HPs were predicted as ligase OTT_0680, OTT_0309. Ligase enzyme catalyzes the joining of two large molecules by forming a new chemical bond. OTT_0680 and OTT_0309 revealed the aminoacyl-tRNA ligase activity and acetate-CoA ligase activity. The aminoacyl-tRNA ligase enzyme is responsible for attaching the proper amino acid to its cognate tRNA by first translating the genetic information into amino acids then attaching to the correct amino acids to their cognate tRNAs. The charged tRNAs are then transported to the ribosome and positioned on the mRNA, allowing protein synthesis to be completed.
Oxidoreductase
HP OTT_1324 identified as oxidoreductase, which has cytochrome-c oxidase activity. Cytochrome c, CutA, and copper chaperone are the main domains present in identified Cu-binding proteins. The literature Cu-binding proteins play a significant role in metabolic process, stress response, and protein folding (Festa and Thiele. 2011).
Regulatory
Gene regulation in prokaryotes and eukaryotes encompasses a diverse set of strategies for producing the desired gene product. This regulatory mechanism is a complicated network that mediates the expression of multiple transcriptional units in bacteria, likely sustaining microbial pathogenicity, growth and survival. The HPs OTT_1450, OTT_0508, OTT_1011, OTT_0140, OTT_0174 and OTBS_0047 have transcriptional regulation, uroporphyrinogen-III synthesis, DNA template transcription and termination, DNA replication, aspartic-type endopeptidase activity with protein processing, and translation initiation factor activity, respectively. The blocking these proteins disrupts normal cellular process, reducing bacterial pathogenicity.
Secretory
Protein export and secretion are essential activities for all living organisms. Secreted or membrane associated proteins are critical for cell survival and pathogenicity in a broad range of ways including cell structure maintenance, motility, cell attachment, metabolite transport, cell-cell interactions, and toxin export. HPs such as OTT_1381, OTT_0417, and OTT_0937 identified as secretory proteins. Secretory proteins enter the endoplasmic reticulum, a network of interconnected organelles that regulates protein translation, protein folding, post-transcriptional modification and forward traffic of suitable molecules like lipid and protein.
Binding
Thirteen HPs were found to be binding proteins, among which eight represented as DNA binding proteins, two ion binding proteins such as OTT_1669, OTBS_1226 responsible for Calcium ion binding and Zinc ion binding respectively along with two ATP binding HPsOTT_0690, OTT_1192 and one NAD binding. HPs that resemble DNA binding protein may aid in DNA replication, repair, and recombination and have an impact on a variety of growth parameters. DNA binding protein-1 plays a crucial role in virulence-related characteristics like as aggregation formation and intracellular proliferation in OT.
Transporter protein
Bacteria contains various transport proteins that allow the import and export of substances such as nutrients, ions, metabolites, amino acids through the cell membrane in order to exclude unnecessary by-products and modify the cytoplasmic content of proton and salts required for microorganism growth and development. Two HPs OTT_0736, OTBS_1977 have been identified as being involved in protein and hydrogen transportation. Passive and active transport of minor solutes across membranes is mediated by bacterial transport proteins.
Structure Analysis And Assessment
In strain Ikeda, HPs; OTT_ 0508, OTT_1526, OTT_0464 were observed to be organized in long alpha helical regions. In Boryong strain OTBS_0057, OTBS_0096, OTBS_0161, OTBS_0806, OTBS_0856 and OTBS_1110 are organized in alpha helical structure interrupted with coil-coil region and beta sheets while proteins such as OTT_0690, OTT_0732 and OTBS_0787, OTBS_0946 and OTBS_1546 are well organized in long coil-coil region and large strands. The cartoon models of all the protein secondary structures were shown (Fig. S1, Table S2). Concerning to the 3D structure model of the HPs, a total of four predictions were generated by the SWISS MODEL server. Based on the GQME and QMEAN4 scores, the best models were selected as the templates for the prediction of tertiary structure (Fig. 3).
A validation system called ERRAT was used to check the statistical accuracy of the created model based on the characteristic atomic interactions of several categories of atoms. The overall quality factor was from 76%-100% which is a very good proportion to build a model. As shown in verify 3D program, the average score of the 3D atomic model – 1D amino acid score more than 0.2 of protein residues. Furthermore, a quality and reliability check has been done on the created 3D model by several structure assessment methods such as Ramachandran plot, Z-score, Q-value (Wiederstein et al. 2007, Schwede et al. 2003). The Z-score is used to validate the quality of the model employing structured resolved proteins. From the PROSA web, the Z-score of the template and the query model were derived. The Z-score was 0.93–6.5 of the homology models which is good enough to build a model (Fig. S2, Fig. 3). An analysis of Ramachandran plot revealed the stereo chemical quality of the HP models and RAMPAGE was used for validation. Ramachandran plot analysis found 88%-99% residues were in the favored region and 0.85%-3.3% residues in the outlier region, suggesting these models high quality and reliability (Fig. S4). The comparable Z-score, Q-value, and Ramachandran plot characteristics verify the homology model accuracy of all the models.
Protein-protein Interaction Network
For functional protein association networks, STRING was used for the prediction of interaction between our hypothetical and conserved proteins with other partners. The partner proteins with score more than 0.5 were included in our result except two proteins from strain Ikeda OTT_0001 and OTT_0636 having lower partner proteins score, such as 0.431 and 0.442 respectively. Although many proteins perform their functions independently, a vast majority of proteins interact with other proteins for proper biological activity. Characterizing protein-protein interactions is critical to understanding protein function and the biology of the cells. Protein-protein interaction study revealed that some hypothetical proteins were involved in essential cellular process such as transport across membrane, biosynthesis of molecules, translational regulation. The result obtained from STRING server deciphered the query proteins interact with other functionally known and unknown uncharacterized proteins (Table 5, Fig. 4).
Host Non-homology Analysis:
An ideal drug target must be an essential and pathogen-specific treatment. To reduce the likelihood of unfavorable cross reactivity of a potential drug with the host protein, it should not have any close homologs in the human proteome. In order to find proteins that are not homologous to the human proteome, a host non-homology analysis was performed. HPs that had been functionally characterized with high confidence undergo host non-homology analysis using BLASTp against the human proteome with an e-value threshold of 0.0001. All the proteins were seemed to be non-homologous due to their non-significant hits. So, it is possible to consider these proteins as prospective therapeutic targets.
Active Site Prediction
As predicted by CASTp v 3.0 algorithms, all the protein models contain unique active sites (Table 6). Based on the area and volume, top active sites of the model proteins were identified. Figure 5 depicted the protein anticipated active sites together with its amino acid residues.