Insilico Analysis of pathogenic genes as a major rescue of Candida albicans

doi:10.21203/rs.3.rs-2057050/v1

Download PDF

Research Article

Insilico Analysis of pathogenic genes as a major rescue of Candida albicans

https://doi.org/10.21203/rs.3.rs-2057050/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

C. albicans a polymorphic, opportunistic pathogen of humans resides commensally in healthy humans. It exists in the form of yeast, hyphal, pseudohyphal or chlymydospores. Significance of polymorphic nature is associated with its survival strategy gained through evolution that made C. albicans a most versatile organism able to survive under extreme microenvironments. In present study an attempt was made to investigate analysis of regulation of yeast to hyphal form transition responsive genes by Insilico methods. Analysis of regulation responsive genes (EFG1, NRG1) was performed. Their identification of true orthologs, characterization of physical and chemical properties, phylogenetics, active site prediction and pharmacomphore designing aiming for docking. These investigations in Understanding its significance in regulation of morphogenesis and virulence in C. albicans for potential target and pharmacophore design.

Molecular docking was used to analyze and understand the interaction between the molecules with their respective targets and top compounds were picked on the basis of binding energy computed via virtual screening tool VINA. All the 1586 ligands screened (pharmacophore screened) for EFG1 were docked in the active site. ZINC31165359 was shown to have best interaction with EFG1 with lowest binding energy of -11.3kcal/mol. Whereas, total of 800 screened ligands (structural similarity) were docked with NRG1 in the active site and ZINC20134767 & ZINC20134767 showed best results for NRG1 with binding energy of -7.4kcal/mol .In conclusion various computation tools used in present research study are very useful in finding new hits for targeting different diseases which can help in the development of potential drugs for the same.

Mycology

Bioinformatics

Biophysics

C.albicans

EFG1

NRG1

pharmacophore

Molecular docking.

I. ORHTOLOGOUS DETECTION BY BLAST(https://blast.ncbi.nlm.nih.gov/Blast.cgi)

Orthologs are the genes that have diverged from same ancestor after a speciation event. In other words, it can be defined as the ‘same genes’ but present in different organisms. This evolutionary relationship implies that genes are diverged during evolution to different species but tends to keep their original functions. The most common way of orthology detection seems to be RBH (Reciprocal Best Hits); according to this concept, two genes that are present in two different organisms are deemed to be orthologs if their protein products find each other as the best hit in the opposite genome.

Searching true orthologs is done using following 2 phases:

Database search phase- In this step, the targets are chosen which are most likely to produce a meaningful alignment with the query. Refer Table 1 for the orthologs organisms obtained using database ‘Candida genome database’ (http://www.candidagenome.org/) [14].
Alignment phase- In this phase, the most promising targets in the database are aligned to the query sequence and scored. The Basic Local Alignment Search Tool (BLAST) is one of the common algorithms used to find homolog’s. During database searching phase, BLAST decomposes the query sequence into small words [15]. These words are then compared to words in the database in order to find significant target sequences for alignment. NCBI’s BLASTP was used with a maximum E-value threshold of 3; excluding organism-Candida albicans, Models and Uncultured/environmental sample sequences [15]. Refer Table 3 for BLASTP result.

RBH detection and orthologous cluster generation by PROTEINORTHO:

Reciprocal (or bidirectional) best hit method is the best way used to determine orthologous relationship between proteins. “hit score” between pair of genes is known as sum of lengths of all matched regions between this pair of genes. For each gene in one genome, the gene in the other genome that shares the greatest hit score with it is defined as its (single-directional) best hit. If two genes are mutual best hits, then this pair of genes is defined as a reciprocal (or bidirectional) best hit of genes, determined by the given group of maximal matches.

Proteinortho (https://gitlab.com/paulklemm_PHD/proteinortho) is a tool can calculate RBH and group proteins considering orthologous relationship [16]. It generates E-value and bit score for all the pair of sequences and then the clustered reciprocal best hit groups these groups can also be known as the orthology groups. This tool provides a parameter known as the algebraic connectivity. The higher the algebraic connectivity the more strongly, the orthologous relationship between proteins. Proteinortho clusters all the input protein sequence in one group and the algebraic connectivity is calculated as 0.998 [16].Proteinortho clusters all the input protein sequence in one group and the algebraic connectivity is calculated as.0.398. Proteinortho result is given in text format[16].

II. PHYSIOCHEMICAL ANALYSIS BY EXPASY-PROTPARAM

ProtParam physiochemical analysis tool for a given protein available in Swiss-Prot or TrEMBL and user provided sequence of protein. The molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity parameters are computed using ProtParam (https://web.expasy.org/protparam/) [17].

III. PHYLOGENETIC ANALYSIS AND RELATION ESTABLISHMENT WITH EVOLUTIONARY HISTORY OF PATHOGENIC LINEAGE FOR EFG1 & NRG1

Phylogenetic tree Construction by MEGA- Multiple sequence alignment for EFG1 (Figure 1) & NRG1 (Figure 2) by MEGA was done using clustalW [18, 19]. The tree generation was done by using stastical method Maximum Likelihood and JTT matrix-based model. Substitution type was kept as Amino acid. Initial tree(s) was obtained by applying Neighbor- Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value[18, 19].

By keeping the following parameters the tree was generated:

GAP opening penalty as 10.0
GAP extension penalty as 0.20
Delay Divergent Cutoff as 30%

The Parameters used in tree construction for EFG1 are as follows:

Statistical method. Neighbor-Joining method.

Substitution type- Amino acid.
Model/Method- p-distance.
Rate among site- Uniform Rates.

Optimal tree with the sum of branch length = 1.61318370 is shown. Calculation of evolutionary distances was done using the p-distance method and is in the units of the number of amino acid differences per site. This analysis involved 11 amino acids.

The Parameters used in tree construction for NRG1 are as follows:

Statistical method- Maximum Likelihood and JTT(Jones-Taylor-Thornton) matrix- based model. Initial tree(s) was obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value
Substitution type- Amino acid.
Model/Method- JTT model
Rate among site- Uniform Rates.

IV. STRUCTURE PREDICTION AND ANALYSIS OF THREE DIMENSIONAL STRUCTURES FOR ACTIVE SITE PREDICTION AND DRUG DESIGNING.

I-TASSER.: 3-Dimensional structures of target proteins

For necessary 3Dimensional structure of EFG1 & NRG1 proteins, a hierarchical protocol for automated protein structure prediction and structure-based function annotation tools was used, I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) [20].

The final model selected by I-TASSER is based on the pair-wise structure similarity. The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa. The model has an estimated TM-score = 0.49±0.15 and estimated RMSD = 11.9±4.4Å calculated based on C-score which is -0.56 and protein length following the correlation observed between these qualities [20].
Discovery Studio: Discovery Studio is a single unified, easy-to-use, graphical interface for powerful drug design and protein modeling research. Discovery Studio contains both established gold-standard applications (e.g., Catalyst, MODELER, CHARMm, etc.), as well as and cutting-edge science to address today’s drug discovery challenges [21, 22]. Here visualization of 3D image of the homology model.
ERRAT Analyses: The statistics of non-bonded interactions between different atom Types are analysed and the value of the error function versus position of a 9-residue sliding Window is plotted [22].

COACH: Active site prediction

It is a meta-server approach for the protein-ligand binding site prediction. Based on structure of target proteins input, COACH will compare bound- specific substructure and sequence profiles and generate complementary ligand binding site predictions using two comparative methods, TM-SITE and S-SITE [23].

Protien modeling Method:

For protein modeling, I-TASSER was used. I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation.

a. I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library. The table below shows the top 10 template-query alignments generated by LOMETS. Templates of high significance measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation. Z- score > 1 indicate a good alignment [24].

4.3.1 Model Validation- PDBsum (PROCHECK):

Pharmacophore modeling, selection and screening

PharmaGist: For pharmacophore modeling we used PharmaGist. When the protein structure is unknown, the ligand-based approach can be useful to suggest possible pharmacophore queries based on a set of aligned active compounds. Basically to create optimal ligand based pharmacophore models, as above mentioned, we can use two methods: common feature alignment, and activity based 3D QSAR modeling. DS Catalyst Hypothesis, Phase, and MOE are commonly used for this purpose [26, 27, 28].

ZINCPharmer: ZINCPharmer (http://zincpharmer.csb.pitt.edu/) provides a mechanism for deriving an initial pharmacophore hypothesis directly from structures within the PDB (Protein Data Bank)[29]. pharmacophore screening is done with ‘zincpharmer’, using this pharmacophore were selected [29].

SwissSimilarity: A Web Tool for Low to Ultra High Throughput Ligand-Based Virtual Screening: SwissSimilarity is a new web tool for rapid ligand-based virtual screening of small to unprecedented ultralarge libraries of small molecules. Predictions can be carried out on-the-fly using six different screening approaches, including 2D molecular fingerprints as well as superpositional and fast nonsuperpositional 3D similarity methodologies.User interface and backend have been designed for simplicity and ease of use, to provide proficient virtual screening capabilities to specialists and nonexperts in the field. SwissSimilarity is accessible free of charge or login at http://www.swisssimilarity.ch [30].

I. TOOLS/SOFTWARES

Molecular modeling for protein structure prediction - validation, pharmacophore designing- Screening and molecular docking were done by using the utilities like MODELLER, PDBsum, SwissSimilarity, PharmaGist, ZINCPHARMER,COACH and Autodock Vina [24-30]. All the imperative images were generated from the tools itself.

Preparation of target and molecules/ligands

The validated protein structures were then added with polar hydrogen, converted into pdbqt formats, grid was also formed in text format based on predicted active site.All the molecules/ligands for EFG1 and NRG1 target were obtained after screening and refined further by energy minimization at both 2 & 3 dimension level and then converted into pdbqt format.

Ligand-based virtual screening/molecular docking studies

All the ligand were docked into the active pocket of both the target proteins EFG1 & NRG1 respectively with the use of the Autodock Vina [31]. Computer-aided protein–ligand binding predictions are a valuable help in drug discovery. Protein–ligand docking programs generally consist of two main components: a scoring function and a search algorithm [31]. Active site prediction of EGF1 (Q59X67) and NRG1 (Q5A0E5) was done using COACH. The active site for EFG1 is predicted ligand binding site pocket containing residues 256 and 261.Also, for NRG1 It is the pocket containing the 230, 233, 246 and 250 (active site pocket report s supplementary data)After docking, the best 5 docked ligand were selected based on the binding energy and later analyzed for interaction with each target[31].

I. ORHTOLOGOUS DETECTION

a. Searching true orthologs:

Database search phase- In this step, the targets are chosen which are most likely to produce a meaningful alignment with the query (Table1). The orthologs organisms obtained using database ‘Candida genome database’. There are four orthologs candida species were found in database namely C. parapsilosis, C. dubliniensis, C. auris and C. glabrata (Table1). EFG1(Accession ID: C1_08590C_A ) and NRG1(Accession ID: C7_04230W_A) were searched for alignment phase(Table 1).

b. Alignment phase- During this phase, the most promising targets in the database are aligned to the query sequence and scored using Basic Local Alignment Search Tool (BLAST) is one of the common algorithms used to find homologs (Table 3). During database searching phase, BLAST decomposes the query sequence into small words. These words are then compared to words in the database in order to find significant target sequences for alignment. NCBI’s BLASTP was used with a maximum E-value threshold of 3; excluding organism-Candida albican, Models and Uncultured/environmental sample sequences (Table 3) for BLASTP result.

Table 1- List of orthologs organisms for EFG1 &NRG1

Species Name

Source Of Information

Accession ID

Gene

Function

Ortholog Organisms

C. albicans SC5314

Candida genome database

C1_08590C_A

EFG1

Squalene epoxidase, epoxidation of squalene to 2,3(S)- oxidosqualene; ergosterol biosynthesis; allylamine antifungal drug target; NADH reducing cofactor but S. cerevisiae EFG1 uses

NADPH; flow model biofilm

induced; Spider

parapsilosis

dubliniensis

C. auris

C. glabrata

biofilm repressed

C. albicans SC5314

Candida genome database

C7_04230W_A

NRG1

Transcription factor/repressor; regulates chlamydospore formation/hyphal gene induction/virulence and rescue/stress response genes; effects both Tup1 dependent and independent regulation; flow model biofilm

induced; Spider biofilm repressed

parapsilosis C.

dubliniensis

C. auris

C. glabrata

Now, using the gene information protein sequence of the EFG1 & NRG1 was found from the database ‘Candida genome database’ shown in Table 2.the gene and protein sequence for EFG1 and NRG1 from well characterized Candida albicans SC5314 were retrieved (Table 2). The location and length of gene and protein were mentioned in Table 2.

Table 2- EFG1 & NRG1 Gene sequence and respective protein sequence.

In BLASTP result the resulted proteins we need coverage of at least 30% of any of the protein sequences in the alignments to the query (Table 3). Total 10 proteins for EFG1 & for 8 proteins for NRG1 were found with significant query coverage 100% to 30 % (Table 3). Whereas, Identity 80.73% to 61.19% for EFG1 and 45.02% to 67.11% for NRG1 was observed. The details of Accession ID of orthologs, Query coverage, E value and Identity were demonstrated (Table 3).The highest Query coverage and identity for EFG1 and NRG1 was observed with Candida maltosa (strain Xu316)(Accession ID:EMG45685.1) and Candida maltosa (strain Xu316) (Accession ID:EMG47643.1) respectively (Table 3).

Table 3- EFG1 & NRG1 BLASTP result. In the resulted proteins we need coverage of at least 30% of any of the protein sequences in the alignments to the query.

Protein	Accession ID of orthologs	Query coverage	E value	Identity
EFG1	EMG45685.1	100%	0.0	80.73%
EFG1	RCK64968.1	100%	0.0	79.82%
EFG1	RLV94007.1	99%	0.0	73.85%
EFG1	CCE43404.1	100%	0.0	68.88%
EFG1	PSK78524.1	99%	0.0	64.07%
EFG1	CCE88912.1	100%	0.0	60.55%
EFG1	KAA8900417.1	99%	0.0	60.69%
EFG1	SGZ50148.1	99%	0.0	61.56%
EFG1	QBM86330	100%	0.0	58.35%
EFG1	GEQ72255.1	100%	0.0	61.19%
NRG1	EMG47643.1	100%	2e-53	45.02%
NRG1	RCK65478.1	40%	1e-43	68.75%
NRG1	RLV92796.1	91%	1e-40	41.70%
NRG1	SGZ55399.1	32%	4e-33	64.52%
NRG1	CCE81639.1	44%	2e-32	53.10%
NRG1	GEQ72681.1	36%	2e-29	60.00%
NRG1	QBM89661.1	66%	3e-29	40.80%
NRG1	QFZ30513.1	30%	2e-28	67.11%

II. PHYSICO-CHEMICAL PROPERTY ANALYSIS

The analysis of EGF1 and NRG1 protein using the ProtParam tool provides valuable information about proteins, including the number of amino acids, molecular weight, and theoretical pI (Table 4). The total number of amino acids in EGF1 and NRG1 is 550 and 310, while Glycine (18.7%) and Proline (11.3%) is the most frequently occurring amino acids. The molecular weight was 59611Da and 34299.22Da (Table 4). The theoretical pI of EGF1 and NRG1 was 9.37 and 9.92. The total numbers of negatively and positively charged residues are 17 and 10 & 28 and 31respectively. The molecular formula is described based on the atomic combination of carbon, hydrogen, nitrogen and oxygen found in protein (Table 4). The extinction coefficient of EGF1 and NRG1 at 280 nm was 64640 M−1cm−1 and 29715 M−1cm−1 (Abs 0.1% = 1 g/L) of 0.995 g/L was observed (Table 4). The instability index is inversely proportional to the stability of proteins, with 59.30 and 57.59 for EGF1 and NRG1 was observed. The estimated half-life of EGF1 and NRG1was observed to be 30 hours (Table 4).

Table-4: Computational analysis of physicochemical parameters of EFG1 and NRG1 Proteins of C. albicans.

Sr.No.	Para meters▼ / Proteins►	EFG1	NRG1
1.	Theoretical pI	9.37	9.92
2.	Molecular Formula	C2553H3933 N767O864S13	C1491H2327 N451O463S10
3.	Total number of atoms	8130	4742
4.	Molecular weight:	59611.71	34299.22
5.	Number of amino acids	550	310
6.	Amino acid composition(High)	Gln (Q) 103(18.7%)	Pro (P) 35(11.3%)
7.	Amino acid composition(Low)	Pyl (O) & Sec (U) 0(0.0%)	Pyl (O) & Sec (U) 0(0.0%)
8.	Extinction coefficients (M^-1 cm^-1) at 280 nm measured in water	64640	29715
9.	Total number of charged residues Negatively(Asp + Glu)	17	10
10.	Total number of charged residues positively(Arg + Lys)	28	31
11.	Estimated half-life (Hrs.)	30	30
12.	Instability Index	59.30	57.59
13.	AliphaticIndex	43.40	49.52
14.	Grand Average of hydropathicity (GRAVY)	-1.005	-0.958

III. PHYLOGENETIC ANALYSIS

Multiple sequence alignment for EFG1 (Figure 1) & NRG1 (Figure 2) by MEGA was done using clustalW. Alignment of the protein sequences deduced among EFG1-10 Sequences. The EGF1 (Accession ID: CR_07890W_A) encoding gene shows significant similarity among homologous. EGF1 gene sequence was compared with sequences from databases; high similarities to several sequences with conserved region were found (Figure1). Suggesting, that five conserved residue patches and conserved motifs. Each comparable member has conserved CDGIYSFR and SYF, GHV residues region (Figure1). Conserved Proline (P), Alanine (A), Histidine(H), Lysine (L), aspartic acid (Y) and (R) residue were found (Figure1).

Alignment of the protein sequences deduced among NRG1-8 Sequences. The NRG1 (Accession ID: C7_04230W_A) encoding gene shows significant similarity among homologous. NRG1 gene sequence was compared with sequences from databases; high similarities to several sequences with conserved region were found (Figure2).

Suggesting, that three conserved residue patches and conserved motifs. Each comparable member has conserved FTT, GHLA and RQDNC residues region (Figure 2). Conserved Cystine (C) and (Q) residue were found (Figure 2).

Phylogenetic tree Construction by MEGA- The four Parameters used in tree construction for EFG1 are as follows: Statistical method. Neighbor-Joining method, Substitution type- Amino acid, Model/Method- p-distance and Rate among site- Uniform Rates. Optimal tree with the sum of branch length = 1.61318370 is shown (Figure 3). Calculation of evolutionary distances was done using the p-distance method and is in the units of the number of amino acid differences per site (Figure 3). This analysis involved 11 amino acids. Dendogram of Candida albicans EFG1/ (Accession No: CR_07890W) With candida viswanathii (Accession No: RCK62494.1), candida tropicalis (Accession No: AJT59418.1), candida maltose Xu316 (Accession No: EMG50168.1), candida parapsilosis (Accession No: CCE45150.1),Mererozyma sp.JA9(Accession No: RLV83154.1), Millerozyma farnosa CBS 7064(Accession No: CCE89067.1 and CCE79776.1), candida intermdia(Accession No: SGZ46625.1),Clavispora lusitaniae (Accession No: QFZ26919.1 and OVF10431.1),Diutina rugosa(Accession No: KAA8901616.1) (Figure 3). EFG1/CR_07890W shows distinct phylogenetic features among the comparative members (Figure 3). It can be reveal that there two subgroups diverged from each other with respect to EGF1 (Figure 3).

The four Parameters used in tree construction for NRG1 are as follows:

Statistical method- Maximum Likelihood and JTT (Jones-Taylor-Thornton) matrix-based model. Initial tree(s) was obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value
Substitution type- Amino acid.
Model/Method- JTT model
Rate among site- Uniform Rates.

This analysis involved ten protein comparative sequnces(Figure 4). Dendogram of Candida albicans NRG1/ (Accession No: C7_04230W_A|NRG1) With Candida maltosa (strain Xu316) (Accession No: EMG47643.1), candida viswanathii (Accession No:.1 RCK65478.1), Spathaspora sp. JA1 (Accession No: RLV92796.1), [Candida] intermedia (Accession No: SGZ55399.1), Pichia sorbitophila (Accession No: CCE81639.1), Metschnikowia sp. JCM 33374 (Accession No: GEQ72681.1), etschnikowia aff. pulcherrima (Accession No: QBM89661.1) and Diutina rugosa (Accession No: QFZ30513.1) (Figure 4). C7_04230W_A|NRG1 shows Phylogenetic relationship with Candida maltosa (strain Xu316) among the comparative members (Figure 4). At glace, there are two subgroups diverged from each other with respect to NRG1 (Figure 4).

IV. PREDICTION OF THREE DIMENSIONAL STRUCTURE AND ACTIVE SITE.

Active site prediction of EGF1 (Q59X67) and NRG1 (Q5A0E5) was done using COACH.

PROTEIN STRUTURES AND ACTIVE SITE: EFG-1 and NRG1: COACH is a meta-server approach for the protein-ligand binding site prediction. Based on structure of target proteins input, COACH will compare bound- specific substructure and sequence profiles and generate complementary ligand binding site predictions using two comparative methods, TM-SITE and S-SITE, which recognize ligand-binding templates from the BioLiP protein function database. These predictions will be combined with results from other methods (including COFACTOR, FINDSITE and ConCavity to generate final ligand binding site predictions. For ligand-binding site prediction, input primary sequence can also be given as input where I-TASSER will be used to generate 3D models first which are then fed into the

COACH pipeline. The generated results are shown below:

Ligand binding sites in active site of EFG1 are at residues: Glycine (GLY) at 256; Glutamic acid (GLU) at 261 positions (Figure 5). The different features for target-ligand site with five ligands were illustrated (Table 5).

Table 5: The features for predicted ligand-binding sites for Efg1.

C-score	Cluster size	PDB Hit	Binding Ligand	Ligand Binding Site Residues
0.06	4	4uuxB	MG	256, 261
0.03	2	1q9uB	ZN	220,295
0.03	2	3tv5C	RCP	214,218,219,238
0.03	2	1mtbB	NTB	207,276
0.03	2	2wieC	CVM	243,328

C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction. From the Table 5, the best predicted site is the first one (C = 0.06), with Ligand binding site at 256 and 261. The respective cluster size i.e the total number of templates in a cluster is given (Table 5). The possible binding ligand information can be viewed in the BioLiP database. The complex with one binding ligand (single complex structure with the most representative ligand in the cluster) and multiple binding ligands (complex structures with all potential binding ligands in the cluster) has also been generated (Table 5)

Ligand binding sites in active site of NRG1 are at residues: Residues are: Cysteine (CYS) at 230 and 233; Histidine (HIS) at 246 and 250.(Figure 6). The different features for target- ligand site with five ligands were illustrated (Table 6).

Table 6: The features for predicted ligand-binding sites Nrg1.

C-score	Cluster size	PDB Hit	Binding Ligand	Ligand Binding Site Residues
0.29	10	2ytoA	ZN	230, 233, 246, 250
0.09	3	2i13B	GOL	242, 245, 269
0.06	3	2i13B	Nuc. Acid	157,159,160,163,167,170,186,190,191,194, 195,198,205,207,209,212,216,219,237,239, 242,246,249,267,269,272,276,279
0.03	1	2i13A	ZN	246, 250
0.02	1	2i13B	GOL	272, 275

C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction. From the Table 6, the best predicted site is the first one (C = 0.29), with Ligand binding site at 230, 233, 246 and 250(Table 6). The respective cluster size i.e the total number of templates in a cluster is given in the table 6. The possible binding ligand information can be viewed in the BioLiP database (Table 6). The complex with one binding ligand (single complex structure with the most representative ligand in the cluster) and multiple binding ligand (complex structures with all potential binding ligands in the cluster) has also been predicted (Table 6).

Protein Homology Modeling: EFG1

Enhanced filamentous growth protein 1(EFG1) from organism Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) with uniprot Id Q59X67 was modeled using I- TASSER using multiple templates (Table7).

Table 7: Table summarizing protein homology modeling features for EFG1.

Protein Name	Enhanced filamentous growth protein 1
Organism	Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
UniProt ID	Q59X67
Model Generated using	I-TASSER
Templates used	Multiple

1. Modeling Method:

For protein modeling, I-TASSER was used. I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation.I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library. The table below shows the top 10 template-query alignments generated by LOMETS (Table 8). Templates of high significance measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation. Z-score > 1 indicate a good alignment (Table 8).

Table 8: The top ten threading templates are represented with significant modeling parameters.

Ran

PDB

Hit

Identity

Covera

Normalized

Z-score

2nbiA

0.08

0.16

0.89

1.83

4ux5A

0.27

0.08

0.20

1.18

3	5jcsS	0.04	0.31	0.99	2.50
4	1l3g	0.21	0.07	0.22	5.19
5	1l3g	0.21	0.07	0.22	4.16
6	2nbiA	0.06	0.16	0.89	4.12
7	2nbiA	0.08	0.16	0.81	2.98
8	2nbiA	0.17	0.16	0.79	1.67
9	4ux5A	0.27	0.08	0.19	3.98
10	2nbiA	0.06	0.16	0.90	1.25

Identity1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence, while Identity 2 is the percentage sequence identity of the whole template chains with query sequence (Table 8). . The coverage of the threading alignment is equal to the number of aligned residues divided by the length of query protein. The normalized Z-score of the threading alignments indicated a good alignment (Table 8).

2. The structure of EFG1 as predicted: Structural details of target proteins:

The final model selected by I-TASSER is based on the pair-wise structure similarity (Table 9). The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations (Table 9). C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa (Table 9). The model has an estimated TM-score = 0.49±0.15 and estimated RMSD = 11.9±4.4Å calculated based on C-score which is -0.56 and protein length following the correlation observed between these qualities, as shown in the (Table 9).

Table 9: The structure was predicted by correlation observed between different modeling qualities.

Name

C-score

Exp.TM-Score

Exp. RMSD

No. of

decoys

Cluster density

Model2

-0.56

0.49 ± 0.15

11.9 ± 4.4Å

600

0.2156

ERRAT Analyses: The statistics of non-bonded interactions between different atoms Types are analyzed and the value of the error function versus position of a 9-residue sliding Window is plotted (figure 8).

The plot generated indicates the confidence and overall quality of the model. On the error axis, two lines were drawn to indicate the confidence with which it is possible to reject that exceeded error value (Figure 9). Expressed as percentage for which the calculated error value fall below 95% rejection limit. The structure having 77.9116 % quality factor can be considered to be a reliable one (Figure 9).

3. Model Validation: For validation of structure, Ramachandran Plot (RC plot) was used. The Ramachandran plot shows the torsional angles - phi (φ) and psi (ψ) - of the residues (amino acids) in the structure (Figure 10). Glycine residues are separately identified by triangles as these are not restricted to the regions of the plot appropriate to the other side chain types (Figure 10). The coloring/shading on the plot represents the different regions, the darkest areas (here shown in red) correspond to the "core" regions representing the most favorable combinations of phi-psi values (Figure 10). Ideally, one would hope to have over 90% of the residues in these "core" (favoured) regions (Figure 10). The percentage of residues in the "core" region is one of the better guides to stereo chemical quality. Also, residues in the disallowed region should ideally be less than or equal to 0.2%. R C plot was generated using PDBsum (which used PROCHECK for the plot generation) (Figure 10). The RC plot for the selected final model is shown in Figure 10.

Table 10: The structure was analyzed with Ramachandran Plot (RC Plot) statistics (generated by PDBsum) for the selected final model is given in table below:

Region	No of residues	Percentage (%)
Most favoured regions [A,B,L]	258	56.1
Additional allowed regions [a,b,l,p]	160	34.8
Generously allowed regions [~a,~b,~l,~p]	26	5.7
Disallowed regions [XX]	16	3.5
Non-glycine and non-proline residues	460	100
End-residues (excl. Gly and Pro)	2	-
Glycine residues	39	-
Proline residues	49	-

The analysis was based in 118 structures of 2.0 Angstroms resolution and R-factor. In Conclusion overall modeling parameters is the most appropriate model we can model with the available template (s) and can be used for further experiments (Table 10). 4.3: Protein Homology Modeling: NRG1.Transcriptional regulator NRG1 from organism Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) with uniprot Id Q5A0E5 was modeled using I-TASSER using multiple templates (Table11).

Table 11: Table summarizing protein homology modeling features for NRG1

Protein Name	Transcriptional regulator NRG1
Organism	Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
UniProt ID	Q5A0E5
Model Generated using	I-TASSER
Templates used	Multiple

1. Modeling Method:

For protein modeling, I-TASSER was used. I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library (Table 12). The table below shows the top 10 template-query alignments generated by LOMETS (Table 12). Templates of high significance measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation. Z-score > 1 indicate a good alignment (Table 12).

Table 12: The top ten threading templates are represented in the table.

Rank	PDB Hit	Identity 1	Identity 2	Coverage	Normalize d Z-score
1	5v3jE	0.15	0.18	0.85	3.70
2	5wjqA	0.13	0.18	0.88	1.55
3	5v3jE	0.12	0.18	0.77	4.24
4	5und	0.17	0.10	0.49	1.03
5	5v3j	0.14	0.18	0.85	1.42
6	5v3jE	0.14	0.18	0.78	6.37
7	5wjqD	0.14	0.18	0.86	1.83
8	5v3j	0.23	0.18	0.35	1.26
9	2nbiA	0.18	0.21	0.98	1.54
10	2ebtA	0.33	0.10	0.29	4.27

Identity1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence, while Identity 2 is the percentage sequence identity of the whole template chains with query sequence. The coverage of the threading alignment is equal to the number of aligned residues divided by the length of query protein. The normalized Z-score of the threading alignments (Table 12).

2. The structure of NRG1 as predicted: Structural details of target proteins:

I-TASSER selects the final models, using SPICKER program, based on the pair-wise structure similarity, and reports up to five models which corresponds to the five largest structure clusters (Table 13). The confidence of each model is quantitatively measured by C- score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations(Table 13).. C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa(Table 13).. The model has an estimated TM-score = 0.47±0.15 and estimated RMSD = 11.0±4.6Å calculated based on C-score which is -2.04 and protein length following the correlation observed between these qualities, as shown in the Table 13.

Table 13: The structure was predicted by correlation observed between different modeling qualities.

Name	C-score	Exp.TM-Score	Exp. RMSD	No. of decoys	Cluster density
Model1	-2.04	0.47+-0.15	11.0+-4.6	2319	0.0477

ERRAT Analyses: The statistics of non-bonded interactions between different atom Types are analysed and the value of the error function versus position of a 9-residue sliding Window is plotted.

The plot generated indicates the confidence and overall quality of the model (Figure 12b). . On the error axis, two lines were drawn to indicate the confidence with which it is possible to reject that exceeded error value (Figure 12b). Expressed as percentage for which the calculated error value fall below 95% rejection limit. The structure having 77.815 % quality factor can be considered to be a reliable one (Figure 12b).

3. Model Validation:

For validation of structure, Ramachandran Plot (RC plot) was used. The Ramachandran plot shows the torsional angles - phi (φ) and psi (ψ) - of the residues (amino acids) in the structure (Figure 13). Glycine residues are separately identified by triangles as these are not restricted to the regions of the plot appropriate to the other side chain types (Figure 13). The coloring/shading on the plot represents the different regions, the darkest areas (here shown in red) correspond to the "core" regions representing the most favorable combinations of phi-psi values (Figure 13). Ideally, one would hope to have over 90% of the residues in these "core" (favoured) regions (Figure 13). The percentage of residues in the "core" region is one of the better guides to stereo chemical quality (Figure 13). Also, residues in the disallowed region should ideally be less than or equal to 0.2%. R C plot was generated using PDBsum (which used PROCHECK for the plot generation). The RC plot for the selected final model is shown in Figure 13.

Table 14: The structure was analyzed with RC PLOT, to highlight the conformation within the model, which showed following statistics: RC plot statistics (generated by PDBsum) for the selected final model is given in table below:

Region	No of residues	Percentage (%)
Most favoured regions [A,B,L]	170	65.4
Additional allowed regions [a,b,l,p]	70	26.9
Generously allowed regions [~a,~b,~l,~p]	15	5.8
Disallowed regions [XX]	5	1.9
Non-glycine and non-proline residues	260	100
End-residues (excl. Gly and Pro)	2	-
Glycine residues	13	-
Proline residues	35	-

The analysis was based in 118 structures of 2.0 Angstroms resolution and R-factor. This is the most appropriate model we can model with the available template (s) and can be used for further experiments (Figure 13, Table 14).

4.4 Selection of inhibitors for pharmacophore modelling with PHARMAGIST:

The inhibitors of our targeted protein Efg1 are allowed to dock with ten potential ligands was performed with Autodock 4.2. (Table 15). ‘Retigeric_acid_B_CID_53319374’ binds most significantly with -7.43 binding energy interaction followed by ligand ‘Biatriosporin_D_CID_132523661’ with -6.91 binding energy (Table 15).

Selection of inhibitors for pharmacophore modelling with PHARMAGIST:

Table 15: Inhibitors (Ligands) of EFG1 with their binding energy by Autodock 4.2.

Ligand	Target	Binding Energy
Biatriosporin_D_CID_132523661.	EFG1	-6.91
amoxapine_CID_2170.	EFG1	-5.69
TrichostatinA_CID_444732.	EFG1	-5.46

chlorpromazine_CID_2726.	EFG1	-4.58
Clozapine_CID_135398737.	EFG1	-5.77
Fluperlapine_CID_49381	EFG1	-6.42
loxapine_CID_3964	EFG1	-5.5
farnesol_CID_445070.	EFG1	-4.61
Retigeric_acid_B_CID_53319374	EFG1	-7.43
suberoylanilidehydroxamic-acid_CID_5311	EFG1	-4.85

On the basis of binding energy we have selected 6 compounds out of 8 to build pharmacophore (Figure 14).

EFG1 and selected inhibitors pharmacophore

The inhibitors of our targeted protein Nrg1 are allowed to dock with ten potential ligands was performed with Autodock 4.2. (Table 16). ‘Mdivi_CID_3825829’ binds most significantly with -5.01 binding energy interaction followed by ligand ‘Doxycycline_CID_54671203’ with -2.57 binding energy (Table 16).

Table 16: Inhibitors of NRG1 with their binding energy by Autodock 4.2.

Ligand	Target	Binding Energy
Doxycycline_CID_54671203	NRG1	-2.57
Mdivi_CID_3825829	NRG1	-5.01

Since we have only two known inhibitors for NRG1 Target, We cannot make pharmacophore. Instead we have used swisssimilarity database(Figure 16).

SwissSimilarity: A Web Tool for Low to Ultra High Throughput Ligand-Based Virtual Screening

SwissSimilarity is a new web tool for rapid ligand-based virtual screening of Screenable compounds include drugs, bioactive and commercial molecules, as well as 205 million of virtual compounds readily synthesizable from commercially available synthetic reagents. User interface and backend have been designed for simplicity and ease of use, to provide proficient virtual screening capabilities to specialists and nonexperts in the field (Figure17).

We have selected Zinc Drug Like molecule database for fingerprint search. Screening can be performed using the following approaches:

FP2 molecular fingerptints from OpenBabel.
Electroshape 5D (including atomic partial charges and lipophilicity contributions) for fast non-super-positional shape-based virtual screening.
Spectrophores: This fast non-superpositional shape-based virtual screening developed by Silicos-IT and implemented in openbabel 2.3.2, uses one-dimensional descriptors generated from the property fields surrounding the molecules.
Shape-IT: A shape-based alignment tool developed by Silicos-IT, which represents molecules as a set of atomic Gaussians and performs molecular alignment as described by Grant and Pickup (J. Phys. Chem. 1995, 99, 3503).
Align-IT, a pharmacophore-based tool from Silicos-IT to align molecules by representing pharmacophoric features as Gaussian 3D volumes. Described by Grant and Pickup (J. Phys. Chem. 1995, 99, 3503).

Combined score

In addition to the score of the above mentioned methods, it is possible to make a consensus 2D/3D screening using a score based on both FP2 Tanimoto coefficient (s1) and Electroshape-5D Manhattan distance (s2). This combined score f(s1,s2) was developed for reverse screening using our SwissTargetPrediction web interface (Figure18).

It was obtained by logistic regression using f(s1,s2)=(1+exp(-a0-a1s1-a2s2))-1, where a0, a1 and a2 are parameters learned by the model to predict possible protein targets for a small molecule based on molecular similarity to known bioactive compounds (Figure18). f (s1,s2) ranges from 0 for totally dissimilar molecules to 1 for perfectly identical molecules (Figure18). This combined score was found to perform significantly better for drug-like molecules than the similarity assessed by FP2 or Electroshape-5D separately. We have used Combine score to Screen the database (Figure18, Figure 19).

The calculated score is ranging from 0.998-0.958. The molecule with ZINC08578218 with maximum and ZINC07045454 with minimum score was obtained (Figure 19). The potent eight outputs were obtained from the SwissSimilarity

Molecular docking statistical result

Table 17: Docking result and binding energies computed for top ZINC ligand for EFG1 & NRG1.

TARGET	LIGAND	Binding energy	Interacting Amino acid	No. of Hydrogen bonds
EFG1	ZINC31165359	-11.3	TYR101	1
	ZINC08765154	-11.1	No Interaction	-
	ZINC28582890	-10.6	GLN126	1
	ZINC62001396	-10.5	GLN126	1
	ZINC31165363	-10.1	GLN126	1
NRG1	ZINC20134767	-7.4	LYS231	1
	ZINC16430849	-7.2	ARG235	1
	ZINC16430852	-7.2	ARG235	1
	ZINC20134767	-7.4	LYS231	1
	ZINC12297005	-6.1	ARG248	1
	ZINC12297006	-6.1	ARG248	1

The Docking result and binding energies computed for top ZINC ligand for EFG1 and NRG1 with most favourable ligand were demonstrated (Table 17).Target EFG1 was docked with five thermodynamically significant ligands (Table 17, Figure 20).Single hydrogen bond during EFG1- ZINC31165359 interaction at 101Tyrosine residues with binding energy -11.3 kcal/mol resulted significant value (Table 17, Figure 20). Whereas, total of 800 screened ligands (structural similarity) were docked with NRG1 in the active site ZINC20134767 showed best results for NRG1 with binding energy of -7.4kcal/mol at LYS231 (Table 17, Figure 20).

Candida albicans a polymorphic, opportunistic pathogen of humans resides commensally in healthy humans [32, 33]. It exists in the form of yeast, hyphal, pseudohyphal or chlymydospores [34, 35]. Significance of polymorphic nature is associated with its survival strategy gained through evolution that made C. albicans a most versatile organism able to survive under extreme microenvironments [32–35]. Therefore it is need of hour to target multifunctional potential molecular candidates to control the treat.

In present study an attempt was made to analysis of EFG1 (Enhanced filamentous growth protein 1) and NRG1 (Transcriptional regulator NRG1) regulation of yeast to hyphal form transition responsive genes by Insilico methods.

EFG1 has significant molecular and biological functions in Transcriptional regulator of the switch between the white and opaque states. These two cell types differ in many characteristics, including cell structure, mating competence, and virulence [36–38]. It also contributes to virulence by regulating hyphal formation and the factors that enable C.albicans to invade and injure endothelial cells [36–38]. Acts as a major regulator of cell wall dynamics and plays a role in interactions with extracellular matrices [39]. EFG1 required for both normoxic and hypoxic biofilm formation. Hypoxic biofilm formation is a major cause of perseverance and antifungal resistance during infections [40, 41]. Involved in drug resistance by regulating the expression of ERG3 [41, 42].

Another potential regulator of C.albicans investigated is NRG1.Involved in regulation of chlamydospore formation, hyphal growth, virulence and stress response [36, 43]. Plays a key role in regulating true hyphal growth [44]. Directs transcriptional repression of a subset of filament-specific genes such as HWP1, HYR1, ALS8, HWP1, or ECE1; via the TUP1 pathway [43–45]. Plays a key role in biofilm formation and dispersion [46].

This study was aimed at identifying regulation responsive genes (EFG1, NRG1), their identification of true orthologs, and characterization of physico-chemical properties, phylogenetics, active site prediction and pharmacomphore designing. These investigations are useful in Understanding its significance in regulation of morphogenesis and virulence and designing potential drug molecules against C. albicans infections.

In identification and characterization of five true orthologous genes was detected by NCBI-BLAST for EFG1 and NRG1 (Table 1, Table 2). Basically, Orthologs are the genes that have diverged from same ancestor after a speciation event. [47, 48]. Therefore, Searching true orthologous was done by two steps, Initial, Database search phase- BLASTP was used with a maximum E-value threshold of 5; excluding organism- C. albicans, Models and Uncultured/environmental sample sequences (Table 1, Table 2, Table 3).

Phylogenetic relationship with tree generation was done by using stastical method Maximum Likelihood and JTT matrix-based model (Fig. 1, Fig. 2). Substitution type was kept as Amino acid. Initial tree(s) was obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value (Fig. 3, Fig. 4). These results reveal the EFG1 and NRG1 of c. albicans are phylogenetically related to highly conserve among lineage (Fig. 1, Fig. 2, Fig. 3 and Fig. 4). Statistically significant variations among clades by univariate analysis of variance were associated in molecular phylogeny of C.albicans [49, 50]. There are conserved residues and amino acid patches were observed during suggesting unique protein structure, conformation, folding to function relationship (Fig. 1, Fig. 2).

Computations of physico-chemical parameters of EFG1 and NRG1 were illustrated [51, 52, 53] (Table 4). Proteins Theoretical pI value reveals for EFG1 and NRG1 is 9.37 and 9.92 suggesting the basic functional microenvironment of Both Proteins (Table 4).whereas, estimated half-life (Hrs.) 30 for both proteins reveals the stability and long time functionality of EFG1 and NRG1(Table 4).Negative hydrophobicity value (GRAVY) providing significant hydropathy molecular nature of EFG1 and NRG1(Table 4).

Protein structures and active site of Efg-1 and Nrg1demostrated ligand binding site residues at position 256 and 261 and 230, 233, 246 and 250 viewed in Discovery Studio respectively providing idea about target oriented drug design (Table 5, Table 6, Fig. 5 and Fig. 6). The plot generated indicates the confidence and overall quality of the protein models. The most appropriate model for Efg1 and Nrg1 was obtained from modeling studies with the available template (s) and used for further experiments (Fig. 8, Fig. 9, Fig. 10, Fig. 11, Fig. 12a, Fig. 12b, Fig. 13, Table 9, Table 10, Table 11, Table 13, Table 14).

The ability of the pathogenic C.albicans to interconvert between budded and hyphal growth states is termed as budded-to-hyphal transition (BHT) [54].It is important for C. albicans development and virulence[54]. (Table 15, Fig. 14). Other reported bioactive structural derivatives of clozapine like Fluperlapine, clothiapine, loxapine, amoxapine and chlorpromazine were interacting Efg1 target with significant binding energy (Table 15, Fig. 14) [54]. A small phenolic compound, Biatriosporin D (BD), displayed anti- virulence activity by inhibiting adhesion, hyphal morphogenesis and biofilm formation of C. albicans [55]. Biatriosporin_D_CID_132523661 with binding energy − 6.91 was observed (Table 15, Fig. 14). Lichen-derived small molecule Retigeric acid B (RAB) acted as an inhibitor that significantly inhibited the filamentation of C. albicans [56]. Retigeric_acid_B_CID_53319374 interacts Efg1 with binding energy − 7.43, a thermodynamically most favourably (Table 15, Fig. 14). suberoylanilide hydroxamic acid (SAHA)* significantly inhibit growth in C. albicans [57]. In present study BE -4.85 was observed (Table 15, Fig. 14). Nrg1 Inhibitors, Mdivi-1 known to inhibit yeast-to- hyphae transition of C. albicans [58]. Mdivi_CID_3825829 and Doxycycline_CID_54671203 with binding energy − 5.01and − 2.01 was observed respectively (Table 16, Fig. 16).

For pharmacophore designing, the dataset was built using literature and molecular docking studies considering only molecules showing binding energy => -5kcal/mol with the target (Table 8) [59–61].The molecules were then screened against Zinc database to obtain the final list of molecules or potential drugs (Table 12, Table 15, Table 17).Whereas, two known molecule was screened with molecular docking studies based on binding energy cut off, which was then used for structural similarity, against the Zinc Drug Like molecule database considering combined score to obtain the potential drugs for NRG1 (Table 15, Table 17)[ 59–61]. Pharmacophore was used for finding the ligands/ molecules for the target EFG1 and it was modelled using PharmaGist for finding potential molecules against EFG1 target (Fig. 15) [59–61]. The final pharmacophore model was selected based on survival scoring and features (Table 15, Table 17, Fig. 20). For finding the ligands for NRG1, structural similarity was performed using SwissSimilarity which is based on the principle to provide a simple web-based tool to perform ligand-based screening of several libraries of small molecules (Fig. 17, Fig. 18, Table 16, Table 17). Potential ligands/molecules obtained after screening for EGF1 and NRG1 are 1586 & 800 respectively. (refer “ZINC screened EFG1 molecules”, “screened dox molecules.txt” & screened mdivi molecules.txt” for final set of ligands. With added the list as supplementary data) [62].

All the 1586 ligands screened (pharmacophore screened) for EFG1 were docked in the active site. ZINC31165359 was shown to have best interaction with EFG1 with lowest binding energy of -11.3kcal/mol (Table 17, Fig. 20). Whereas, total of 800 screened ligands (structural similarity) were docked with NRG1 in the active site and ZINC20134767 & ZINC20134767 showed best results for NRG1 with binding energy of − 7.4kcal/mol (Table 17, Fig. 20).In conclusion, various computation tools used in our research study are very useful in finding new hits for targeting different diseases which can help in the development of potential drugs for the same. Further, In vitro experimental investigations are required for the confirmation of prediction.

Our research studies show identification and characterization of true orthologous genes, physico-chemical Analysis, phylogenetic analysis and relation establishment with evolutionary history of pathogenic lineage of two candida albicans genes (EFG1 and NRG1) using bioinformatics tools. EFG1 and NRG1 were reported for their molecular function in c. albicans morphogenesis, virulence and drug resistance. Thus, multi-functionality of EFG1 and NRG1, attracting as potential drug target. Further, pharmacophore and structural similarity approach were used with known inhibitors to find potential drugs or inhibitors for EFG1 and NRG1. Molecular docking was used to analyze and understand the interaction between the molecules with their respective targets and top compounds were picked on the basis of binding energy computed via virtual screening tool VINA. All the 1586 ligands screened (pharmacophore screened) for EFG1 were docked in the active site. ZINC31165359 was shown to have best interaction with EFG1 with lowest binding energy of -11.3kcal/mol. Whereas, total of 800 screened ligands (structural similarity) were docked with NRG1 in the active site and ZINC20134767 & ZINC20134767 showed best results for NRG1 with binding energy of − 7.4kcal/mol (Table 17).In conclusion, various computation tools used in our research study are very useful in finding new hits for targeting different diseases which can help in the development of potential drugs for the same.

The author declares no competing interests.

Bengtson, Stefan, et al. "Fungus-like mycelial fossils in 2.4-billion-year-old vesicular basalt." Nature Ecology & Evolution 1.6 (2017): 1–6.
Silva, Patricia Maria de Oliveira E. Temporal and spatial control of fungal filamentous growth in Candida albicans. Diss. Université Côte d'Azur, 2018.
Kadosh, David, and Alexander D. Johnson. "Induction of the Candida albicans filamentous growth program by relief of transcriptional repression: a genome-wide analysis." Molecular biology of the cell 16.6 (2005): 2903–2912.
Bander, Kalil I., and Thekra A. Hamad. "Prevalence of Vaginal Candidiasis among women and Diagnosis of Candida species from vaginal infection in Kirkuk city." Tikrit Journal of Pure Science 20.4 (2018): 5–15.
Biswas, Subhrajit, Patrick Van Dijck, and Asis Datta. "Environmental sensing and signal transduction pathways regulating morphopathogenic determinants of Candida albicans." Microbiology and Molecular Biology Reviews 71.2 (2007): 348–376.
Mayer, François L., Duncan Wilson, and Bernhard Hube. "Candida albicans pathogenicity mechanisms." Virulence 4.2 (2013): 119–128.
Sohn, K., et al. "EFG1 is a major regulator of cell wall dynamics in Candida albicans as revealed by DNA microarrays." Molecular microbiology 47.1 (2003): 89–102.
Noble, Suzanne M., Brittany A. Gianetti, and Jessica N. Witchley. "Candida albicans cell-type switching and functional plasticity in the mammalian host." Nature Reviews Microbiology 15.2 (2017): 96.
Xu, Hongbin, et al. "S. oralis activates the Efg1 filamentation pathway in C. albicans to promote cross-kingdom interactions and mucosal biofilms." Virulence 8.8 (2017): 1602–1617.
Leng, Ping, et al. "Efg1, a morphogenetic regulator in Candida albicans, is a sequence- specific DNA binding protein." Journal of bacteriology 183.13 (2001): 4090–4093.
Murad, A. Munir A., et al. "NRG1 represses yeast–hypha morphogenesis and hypha-specific gene expression in Candida albicans." The EMBO journal 20.17 (2001): 4742–4752.
Braun, Burkhard R., David Kadosh, and Alexander D. Johnson. "NRG1, a repressor of filamentous growth in C. albicans, is down-regulated during filament induction." The EMBO journal 20.17 (2001): 4753–4761.
Uppuluri, Priya, et al. "The transcriptional regulator Nrg1p controls Candida albicans biofilm formation and dispersion." Eukaryotic cell 9.10 (2010): 1531–1537.
Skrzypek, Marek S., et al. "The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data." Nucleic acids research (2016): gkw924.
Li, Yu-Cheng, and Yi-Chang Lu. "BLASTP-ACC: Parallel architecture and hardware accelerator design for BLAST-based protein sequence alignment." IEEE Transactions on Biomedical Circuits and Systems 13.6 (2019): 1771–1782.
Cosentino, Salvatore, and Wataru Iwasaki. "SonicParanoid: fast, accurate and easy orthology inference." Bioinformatics 35.1 (2019): 149–151.
ProtParam, E. "ExPASy-ProtParam tool." (2017).
Stecher, Glen, Koichiro Tamura, and Sudhir Kumar. "Molecular evolutionary genetics analysis (MEGA) for macOS." Molecular Biology and Evolution 37.4 (2020): 1237–1239.
Hung, Jui-Hung, and Zhiping Weng. "Sequence alignment and homology search with BLAST and ClustalW." Cold Spring Harbor Protocols 2016.11 (2016): pdb- prot093088.
Zhang, Chengxin, et al. "Template-based and free modeling of I‐TASSER and QUARK pipelines using predicted contact maps in CASP12." Proteins: Structure, Function, and Bioinformatics 86 (2018): 136–151.
Studio, Discovery. "Discovery Studio." Accelrys [2.1] (2008).
Webb, Benjamin, and Andrej Sali. "Comparative protein structure modeling using MODELLER." Current protocols in bioinformatics 54.1 (2016): 5–6.
Wu, Qi, et al. "COACH-D: improved protein–ligand binding sites prediction with refined ligand-binding poses through molecular docking." Nucleic acids research 46.W1 (2018): W438-W442.
Yang, Jianyi, et al. "The I-TASSER Suite: protein structure and function prediction." Nature methods 12.1 (2015): 7–8.
Laskowski, Roman A., et al. "PDBsum: Structural summaries of PDB entries." Protein science 27.1 (2018): 129–134.
Inbar Y, Schneidman-Duhovny D, Dror O, Nussinov R, Wolfson HJ. Deterministic Pharmacophore Detection via Multiple Flexible Alignment of Drug-Like Molecules. In Proc. of RECOMB 2007, vol. 3692 of Lecture Notes in Computer Science, pp. 423–434. Springer Verlag.
Schneidman-Duhovny D, Dror O, Inbar Y, Nussinov R, Wolfson HJ. PharmaGist: a webserver for ligand-based pharmacophore detection. Nucleic Acids Research 2008.
Dror O, Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. Novel approach for efficient pharmacophore-based virtual screening: method and applications. J Chem Inf Model. 2009 Oct;49(10):2333–43.
David Ryan Koes, Carlos J. Camacho, ZINCPharmer: pharmacophore search of the ZINC database, Nucleic Acids Research, Volume 40, Issue W1, 1 July 2012, Pages W409–W414, https://doi.org/10.1093/nar/gks378
V. Zoete, A. Daina, C. Bovigny, and O. Michielin, SwissSimilarity. A web tool for low to ultra highthroughput ligand-based virtual screening, J. Chem. Inf. Model. 56 (2016), pp. 1399–1404.
Morris, G. M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S. and Olson, A. J. (2009) Autodock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Computational Chemistry 2009, 16: 2785-91.
Gow, Neil AR, and Bhawna Yadav. "Microbe Profile: Candida albicans: a shape- changing, opportunistic pathogenic fungus of humans." Microbiology 163.8 (2017): 1145–1147.
Mayer, François L., Duncan Wilson, and Bernhard Hube. "Candida albicans pathogenicity mechanisms." Virulence 4.2 (2013): 119–128.
Ingle, Sujata, et al. "Chlamydospore Specific Proteins of Candida albicans." Data 2.3 (2017): 26.
Lu, Yang, Chang Su, and Haoping Liu. "Candida albicans hyphal initiation and elongation." Trends in microbiology 22.12 (2014): 707–714.
Jones, Ted, et al. "The diploid genome sequence of Candida albicans." Proceedings of the National Academy of Sciences 101.19 (2004): 7329–7334.
Braun, Burkhard R., and Alexander D. Johnson. "TUP1, CPH1 and EFG1 make independent contributions to filamentation in Candida albicans." Genetics 155.1 (2000): 57–67.
Phan, Quynh T., Paul H. Belanger, and Scott G. Filler. "Role of Hyphal Formation in Interactions ofCandida albicans with Endothelial Cells." Infection and immunity 68.6 (2000): 3485–3490.
Sohn, K., et al. "EFG1 is a major regulator of cell wall dynamics in Candida albicans as revealed by DNA microarrays." Molecular microbiology 47.1 (2003): 89–102.
Prasad, Tulika, et al. "Morphogenic regulator EFG1 affects the drug susceptibilities of pathogenic Candida albicans." FEMS yeast research 10.5 (2010): 587–596.
Stichternoth, Catrin, and Joachim F. Ernst. "Hypoxic adaptation by Efg1 regulates biofilm formation by Candida albicans." Applied and environmental microbiology 75.11 (2009): 3663–3672.
Lo, Hsiu-Jung, et al. "Efg1 involved in drug resistance by regulating the expression of ERG3 in Candida albicans." Antimicrobial agents and chemotherapy 49.3 (2005): 1213–1215.
Murad, A. Munir A., et al. "NRG1 represses yeast–hypha morphogenesis and hypha-specific gene expression in Candida albicans." The EMBO journal 20.17 (2001): 4742–4752.
Nantel, André, et al. "Transcription profiling of Candida albicans cells undergoing the yeast-to-hyphal transition." Molecular biology of the cell 13.10 (2002): 3452–3465.
Kadosh, David, and Alexander D. Johnson. "Induction of the Candida albicans filamentous growth program by relief of transcriptional repression: a genome-wide analysis." Molecular biology of the cell 16.6 (2005): 2903–2912.
Uppuluri, Priya, et al. "The transcriptional regulator Nrg1p controls Candida albicans biofilm formation and dispersion." Eukaryotic cell 9.10 (2010): 1531–1537.
Koonin, Eugene V. "Orthologs, paralogs, and evolutionary genomics." Annu. Rev. Genet. 39 (2005): 309–338.
Taylor, John S., and Jeroen Raes. "Duplication and divergence: the evolution of new genes and old ideas." Annu. Rev. Genet. 38 (2004): 615–643.
Odds, Frank C., et al. "Molecular phylogenetics of Candida albicans." Eukaryotic cell 6.6 (2007): 1041–1052.
Odds, Frank C. "Molecular phylogenetics and epidemiology of Candida albicans." Future microbiology 5.1 (2010): 67–79.
Nobile, Clarissa J., and Aaron P. Mitchell. "Genetics and genomics of Candida albicans biofilm formation." Cellular microbiology 8.9 (2006): 1382–1391.
Puri, Nidhi, et al. "Analysis of physico-chemical properties of substrates of ABC and MFS multidrug transporters of pathogenic Candida albicans." European journal of medicinal chemistry 45.11 (2010): 4813–4826.
Gasteiger, Elisabeth, et al. "Protein identification and analysis tools on the ExPASy server." The proteomics protocols handbook. Humana press, 2005. 571–607.
Midkiff, John, et al. "Small molecule inhibitors of the Candida albicans budded-to- hyphal transition act through multiple signaling pathways." PloS one 6.9 (2011): e25395.
Zhang, Ming, et al. "Biatriosporin D displays anti-virulence activity through decreasing the intracellular cAMP levels." Toxicology and Applied Pharmacology 100.322 (2017): 104–112.
Xiao, Xuan, Zhi-Cheng Wu, and Kuo-Chen Chou. "A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites." PloS one 6.6 (2011): e20592.
Simonetti, Giovanna, et al. "Histone deacetylase inhibitors may reduce pathogenicity and virulence in Candida albicans." FEMS yeast research 7.8 (2007): 1371–1380.
Koch, B., Barugahare, A. A., Lo, T. L., Huang, C., Schittenhelm, R. B., Powell, D. R., Beilharz, T. H., & Traven, A. (2018). A Metabolic Checkpoint for the Yeast-to-Hyphae Developmental Switch Regulated by Endogenous Nitric Oxide Signaling. Cell Reports,
Qasim, Romasa, et al. "An In-Silico Pharmacophore Based Anti-viral Drug development for Hepatitis C Virus."
Schneidman-Duhovny, Dina, et al. "PharmaGist: a webserver for ligand-based pharmacophore detection." Nucleic acids research 36.suppl_2 (2008): W223-W228.
Dror O, Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. Novel approach for efficient pharmacophore-based virtual screening: method and applications. J Chem Inf Model. 2009 Oct;49(10):2333–43.
David Ryan Koes, Carlos J. Camacho. ZINCPharmer: pharmacophore search of the ZINC database. Nucleic Acids Research, Volume 40, Issue W1, 1July 2012, Pages W409-W414.

Download PDF

Version 1

posted

You are reading this latest preprint version

Insilico Analysis of pathogenic genes as a major rescue of Candida albicans

Status:

Version 1

Abstract

Figures

Methods And Methodology

Result

Discussion

Conclusion

Declarations

References

Supplementary Files

Status:

Version 1