All proteins
SGs are known to play a prominent role in neurodegenerative diseases and myopathies either through their accumulation in neurodegenerative brains or via mutations in genes encoding RBPs for SG response (such as those linked with motor neuron diseases) [13]. Several bioinformatics studies have linked intrinsically disordered proteins with various diseases such as diabetes, cancer, and cardiovascular and neurodegenerative diseases [10, 33–35]. IDPs can fold upon interaction with the corresponding binding partner and have precise functional control over post-translational modifications and alternative splicing in addition to the promiscuous and plastic nature of binding which makes them key players in various pathological conditions. Furthermore, protein-protein interactions involving intrinsically disordered proteins are a promising target to develop new strategies for drug development wherein, the drug can be designed that either mimics the disordered motif or the IDP and targets the binding site [36].
A dense protein-protein interaction (PPI) network generated for highly interconnected 453 proteins of the SG proteome were generated using the STRING database on medium confidence of 0.4 (Supplementary Figure S1). The 453 nodes (proteins) in this PPI network have 6153 edges (connections), and this is significantly higher than 2289 expected number of edges for a random set of proteins of the same size and degree distribution drawn from the genome characterized by PPI enrichment p-value of < 10− 16. Therefore, such an enrichment indicates that the proteins are at least partially biologically connected as a group. The average node degree, defined as the number of connections of a particular node to other nodes is 27.2 for this network (i.e. on an average, each protein interacts with 27 other proteins). The average local clustering coefficient defines how close the neighbours of a node are to being a complete clique, and is equal to 1, if every neighbour connected to a given node Ni is also connected to every other node in the neighbourhood and is equal to 0 if none of the nodes connected to a given node Ni connects to any other node connected to Ni. The average local clustering coefficient for this network is 0.384. The most important gene-ontology (GO) assigned biological processes to this network are post-transcriptional regulation of gene expression, regulation of translation, regulation of mRNA metabolic process, mRNA metabolic process, nucleic acid metabolic process, regulation of gene expression and cellular nitrogen compound metabolic process. The prominent molecular functions associated with this network are RNA binding, nucleic acid binding, heterocyclic compound binding, organic cyclic compound binding, binding, mRNA binding, translation regulator activity, mRNA-3’UTR binding, protein binding and cadherin binding. Among the cellular components of this network are ribonucleoprotein granule, cytoplasmic ribonucleoprotein granule, cytoplasmic stress granule, supramolecular complex, nucleus, cytosol, intracellular, non-membrane-bounded organelle, ribonucleoprotein complex and intracellular non-membrane-bounded organelle.
The proteins of the SG proteome have been classified on the basis of PPIDMean, calculated by averaging the PPID values obtained across six disorder predictors (PONDR- VLXT, VSL2, VL3, FIT and IUPred2A- long and short) (Supplementary Table 1). These proteins have been classified as highly ordered (PPIDMean <10%), moderately disordered (10% ≤ PPIDMean <30%) and highly disordered (PPIDMean ≥30%) [37]. From this PPIDMean -based classification, 39 proteins out of 460 are predicted as highly ordered, 161 are moderately disordered and 260 are highly disordered. Figure 1(A) shows a two-dimensional PPIDPONDR−FIT vs. PPIDMean plot representing the predisposition of 460 SG component proteins to intrinsic disorder.
To gain further insight into the nature of intrinsic disorder in the entire proteome, we employed the combined CH-CDF plot analysis (Supplementary Table 2). The combined CH-CDF plot analysis provides information about its structural and physical characteristics based on the quadrant in which it is located. The four quadrants of the CH-CDF plot are classified as- Q1 (protein is predicted to be ordered by both CH and CDF predictors), Q2 (protein predicted to be ordered by CH and disordered by CDF plot), Q3 (protein predicted to be disordered by both CH and CDF plot), Q4 (protein predicted to be disordered by CH plot and ordered by CDF plot). Figure 1(B) shows the combined CH-CDF plot analysis employed for 460 stress granule protein components. A total of 189 proteins are found in Q1 and are predicted to be ordered by both predictors, while 163 proteins in Q2 are molten globular proteins and/or hybrid proteins, predicted as ordered by CH plot and disordered by CDF. The ordered and disordered residues in these proteins are comparable and these proteins may lack unique stable 3D structures. The Q3 quadrant has 99 highly disordered proteins with presence of intrinsic disorder predicted by both the predictors. The Q4 quadrant constitutes 9 proteins predicted as disordered by CH plot and ordered by CDF plot. Out of the 460 SG component proteins, 271 (58.9%) proteins are predicted to be significantly disordered further validating that the intrinsic disorder is abundant in the SG proteome.
The 10 most disordered proteins in “All proteins” category of MSGP, based on PPIDMean are Non-histone chromosomal protein HMG-14 (HMGN1; PPIDMean: 100; MobiDB: 100%), High mobility group protein HMG-I/HMG-Y (HMGA1; 100; 100%), CASC3 (94.38; 82.5%), PRRC2A (PRRC2A; 93.21; 86.6%), Phostensin (PPP1R18; 89.01; 80.4%), Tankyrase-1-binding protein (TNKS1BP1; 88.85; 79.1%), PRKC apoptosis WT1 regulator protein (PAWR; 87.4; No entry in MobiDB), Plasminogen activator inhibitor 1 RNA-binding protein (SERBP1; 86.68; 83.6%), Intracellular hyaluronan-binding protein 4 (HABP4; 86.44; 75.8%) and RNA-binding protein EWS (EWS; 85.87; 64.9%) (Table 1).
Table 1
The top 5 disordered proteins of the stress granule proteome. Universal protein resource (Uniprot) ID, molecular function, PPIDmean, and their OMIM information for their probable links to human genetic disorders.
Protein Name | Abbreviation | Molecular Function | Uniprot ID | PPIDMean | OMIM Disease |
Non-histone chromosomal protein HMG-14 | HMGN1 | DNA binding, nucleosomal DNA binding | P05114 | 100 | |
High mobility group protein HMG-I/HMG-Y | HMGA1 | 5'-deoxyribose-5-phosphate lyase activity, DNA binding, chromatin binding, DNA-lyase activity, enzyme binding, ligand-dependent nuclear receptor transcription coactivator activity, peroxisome proliferator-activated receptor binding, retinoic acid receptor binding, transcriptional factor activity, transcription factor binding | P17096 | 100 | Susceptibility to Diabetes mellitus |
Phostensin | PPP1R18 | Actin binding, phosphatase binding | Q6NYC8 | 89.01604133 | |
182 kDa tankyrase-1-binding protein | TNKS1BP1 | Ankyrin repeat binding, cadherin binding, enzyme binding, macromolecular complex binding | Q9C0C2 | 88.85601504 | |
PRKC apoptosis WT1 regulator protein | PAWR | Actin binding, enzyme binding, leucine zipper domain binding, transcription co-repressor activity | Q96IZ0 | 87.94068627 | |
Figure 2 represents functional disorder profiles of the 5 (out of 10 most disordered proteins of the “All proteins” section of MSGP database) generated by D2P2 platform. D2P2 provides complementary disorder evaluations along with disorder related functional information by combining the outputs of IUPred, PONDR® VLXT, PrDOS [38], PONDR® VSL2B, PV2, and ESpritz [39] (representing disordered regions by coloured bars). Figure 2 also shows the positions of mostly structured SCOP domains [40] generated by the SUPERFAMILY predictor [41], Molecular recognition features (MoRF regions) identified by the ANCHOR algorithm [42], and different post translational modifications (PTMs) obtained from the PhosphoSitePlus platform [43]. PTMs affect efficiency of protein folding, stability of their conformations and influence their biological and catalytic activity [44]. The functional disorder profiles indicate the abundance of MoRF sites in all of these 5 proteins. Moreover, phosphorylation sites are abundant in PPP1R18, TNKSBP1 and PAWR proteins. In addition to phosphorylation sites, several ubiquitination and acetylation sites for HMGN1 and ubiquitination, acetylation and methylation sites for HMGA1 can also be seen.
Per-residue disorder profile for proteins – HMGN1, HMGA1, TNKSBP1, PPP1R18 and PAWR are shown in Fig. 3. These graphs suggest the presence of many IDRs that lack well-defined structures in their native state. Remarkable agreement between outputs of these six predictors validate higher tendency of disorder.
The corresponding high resolution Alphafold structures of the five most disordered SG proteins were shown in Supplementary Figure S2, that indicates the presence of intrinsic disorder in all of these proteins (as indicated by low per-residue confidence scores suggesting unstructured domains). This is in complete agreement with the predictions by various disorder predictors employed above.
Further, we next analysed the functional associations among the most disordered proteins of the SG proteome using the STRING database. Our analysis initially showed that none of the 10 most disordered proteins are interacting with each other (Fig. 4A). This indicates that this set of the top 10 most disordered proteins of the SG proteome are not very well connected. It is noteworthy here, that it is not necessary that this set of proteins are not biologically important but rather it simply means that these proteins have not been studied very much and that their interactions might not be deposited in databases. Next, we extended the PPI network to include 50 first shell interactors of these 10 highly disordered proteins. The corresponding functional association network using STRING generated on highest confidence of 0.9 is shown in Fig. 4B. The resulting PPI network includes 60 nodes connected by 390 edges, which is significantly greater than expected 112 number of edges for a randomly selected set of proteins of comparable size. The average node degree of this network is 13 (on average, each protein is linked to 13 other proteins) and average local clustering coefficient is 0.789. The top 10 GO-annotated biological processes of this network are nuclear- transcribed mRNA catabolic process, mRNA metabolic process, nuclear-transcribed mRNA catabolic process, nonsense-mediated decay, RNA metabolic process, nucleic acid metabolic process, heterocycle metabolic process, protein targeting to ER, cellular aromatic compound metabolic process, viral transcription and SRP-dependent co-translational protein targeting to membrane. Among the molecular functions associated with this network are RNA binding, structural constituent of ribosome, nucleic acid binding, structural molecule activity, heterocyclic compound binding, organic cyclic compound binding, rRNA binding, mRNA binding, binding and ubiquitin ligase inhibitor activity. The cellular components associated with this network are ribonucleoprotein complex, cytosolic ribosome, ribosomal subunit, ribosome, cytosolic small ribosomal subunit, protein containing complex, U2-type catalytic step 1 spliceosome, nuclear lumen, nucleus and exon-exon junction complex.
The MSGP database (https://msgp.pt/) used GEO2R tool to perform a differential transcriptomic analysis of the GEO Datasets: GSE33000 [45], GSE28894 and GSE4595 [46] to assess the expression profiles of SG components. The study showed that out of the 464 SG genes, 380 genes showed the altered expression in the brain of AD patients, in which 187 genes were upregulated and 193 genes were downregulated [47]. Similarly, 395 genes have altered expression in the brain of HD patients, in which 195 genes have increased and 200 genes have decreased expression.
Here, the gene expression level of the five most disordered proteins HMGN1, HMGA1, PPP1R18, TNKS1BP1 and PAWR in context of various neurodegenerative disorders obtained from the MSGP database are shown in Supplementary Figure S3. The genes HMGN1, HMGA1, and TNKS1BP1 were upregulated in ALS while gets downregulated in both AD and PD. The genes PPP1R18 and PAWR were upregulated in AD and HD. This indicates that these genes have differentially regulation in different neurodegenerative diseases.
Five out of these 10 most disordered proteins are also RNA-binding proteins (CASC3, PRRC2A, SERBP1, HABP4 and EWS) and therefore, will be discussed in the next section.
RNA-Binding Proteins
RNA binding proteins (RBPs) regulate essential aspects of cell’s life including transcription, splicing, modification, intracellular trafficking, translation, degradation and elimination, and are critical effectors of gene expression, with their dysfunction leading to several human diseases. Certain RBPs have the tendency to coalesce into membrane-less compartments by liquid-liquid phase separation and this process in mediated by establishment of weak interactions between RNA molecules and RBPs involving low-complexity and intrinsically disordered regions. The dynamics of phase separation can be potentially linked with age as is the gradual loss of regulation of gene expression and increase in number of misfolded proteins combined with decline in mitochondrial activity leading to aggregation of amyloids [16]. Furthermore, mutations in SG components are linked to several neurodegenerative diseases and these mutations have been reported in aggregation prone Prion-like domains (PLDs), low-complexity domains (LCDs), intrinsically disordered regions (IDRs) and RNA-binding motifs within the RBPs [48].
The functional-association network of 250 RBPs of the SG proteome generated using STRING on a medium confidence of 0.4 is shown in Supplementary Figure S4. The 245 nodes (proteins) in this PPI network have 3634 edges (connections), which is significantly higher than 756 expected number of edges for a random set of proteins of same size and distribution drawn from the genome, and this is characterized by PPI enrichment p-value of < 10− 16. Such an enrichment score indicates that a partially biological connections exist between these proteins. The average node degree is 29.7 (i.e. on average, each protein interacts with 29 other proteins) and the average local clustering coefficient is 0.464 for this network. The most important gene-ontology (GO) assigned biological functions to this network are posttranscriptional regulation of gene expression, regulation of translation, regulation of mRNA metabolic process, RNA metabolic process, mRNA metabolic process, nucleic acid metabolic process, nucleic acid metabolic process, gene expression, cellular nitrogen compound metabolic process, RNA processing and Nucleobase-containing compound metabolic process. The prominent molecular functions of this network are RNA-binding, heterocyclic compound binding, organic cyclic compound binding, mRNA binding, binding, mRNA 2’-UTR binding, translation regulator activity, double-stranded RNA binding, translation regulator activity, nucleic acid binding, and single stranded RNA binding. Among the cellular components of this network are ribonucleoprotein granule, cytoplasmic ribonucleoprotein granule, ribonucleoprotein complex, cytoplasmic stress granule, nucleus, supramolecular complex, nuclear lumen, protein-containing complex, nucleoplasm and intracellular.
The combined CH-CDF plot has been analysed for the characterization of intrinsic disorder in the RNA-binding proteome (Fig. 5A). In the combined CH-CDF plot for the RBPs, 64 proteins are found in Q1, indicating order by both the predictors. 93 proteins are located in Q2, predicted as ordered by CH plot and disordered by CDF. A total of 89 proteins are predicted as disordered by the both the predictors and are found in Q3. The Q4 constitutes 4 proteins predicted as disordered by CH plot and ordered by CDF plot. Out of the 250 RBPs in the SG proteome, 186 (74.4%) proteins are identified as having a considerable amount of intrinsic disorder, validating previous studies that suggested the abundance of disorder in RBP components of SGs. From the PPIDMean- based classification in the 250 RBPs of the SG proteome, 14 proteins are highly ordered (PPIDMean<10), 81 proteins are moderately disordered (10 ≤ PPIDMean <30) and 155 proteins are highly disordered (PPIDMean≥30). Figure 5B shows a two-dimensional PPIDPONDR−FIT vs. PPIDMean plot representing the predisposition of 250 RBPs component of stress granules to intrinsic disorder.
The 10 most disordered RBPs of the mammalian SG proteome are CASC3 (CASC3; PPIDMean: 94.38; No predicted disorder entry on MobiDB), PRRC2A (PRRC2A; 93.21; No entry on MobiDB), Plasminogen activator inhibitor 1 RNA-binding protein (SERBP1; 86.68; 42%), Intracellular hyaluronan-binding protein 4 (HABP4; 86.44; No entry on MobiDB), RNA-binding protein EWS (EWS; 85.87; No entry on MobiDB), Fused in Sarcoma (FUS; 84.53; No entry), Scaffold attachment factor B2 (SAFB2; 84.12; 49.8%), Eukaryotic translation initiation factor 4B (EIF4B; 84.06; No entry), Coiled-coil domain containing 9B (CCDC9B; 84.04; No entry) and Antigen KI-67 (MKI67; 83.88; No entry) (Table 2).
Table 2
The 10 most disordered RNA-binding proteins of the stress granule proteome name, their Universal protein resource (Uniprot) ID, molecular function, PPIDmean, and OMIM information for their probable links to human genetic disorders.
Protein Name | Abbreviation | Molecular Function | Uniprot ID | PPIDMean | OMIM Disease |
Protein CASC3 | CASC3 | Enzyme binding, identical protein binding, ubiquitin protein ligase binding | O15234 | 94.38158606 | |
Protein PRRC2A | PRRC2A | RNA-binding | P48634 | 93.21509581 | |
Plasminogen activator inhibitor 1 RNA-binding protein | SERBP1 | Cadherin binding, mRNA 3'-UTR binding, RNA binding | Q8NC51 | 86.68094771 | |
Intracellular hyaluronan-binding protein 4 | HABP4 | RNA-binding | Q5JVS0 | 86.44173123 | |
RNA-binding protein EWS | EWS | Calmodulin binding, identical protein binding, metal ion binding, RNA binding | Q01844 | 85.875 | Ewing sarcoma, Neuroepithelioma |
Fused in Sarcoma | FUS | DNA binding, estrogen receptor binding, identical protein binding, ionotropic glutamate receptor binding, metal ion binding, myosin V binding, retinoid X receptor binding, RNA binding, thyroid hormone receptor binding, transcription coactivator activity | P35637 | 84.53581749 | Amyotrophic lateral sclerosis, Tremor |
Scaffold attachment factor B2 | SAFB2 | Double-stranded DNA binding, identical protein binding, RNA binding, sequence-specific DNA binding | Q14151 | 84.12063484 | |
Eukaryotic translation initiation factor 4B | EIF4B | Helicase activity, ribosomal small subunit binding, RNA binding, RNA strand annealing and exchange activity, translation initiation factor activity | P23588 | 84.06964539 | |
Coiled-coil domain containing 9B | CCDC9B | RNA-binding | Q6ZUT6 | 84.04989388 | |
Antigen KI-67 | MKI67 | ATP binding, DNA binding, protein C-terminus binding, RNA binding | P46013 | 83.88072072 | |
Figure 6 shows functional disorder profiles providing complementary disorder evaluations along with disorder related functional information of the 10 most disordered RBPs generated by D2P2 platform. With higher intrinsic disorder propensity, these RBPs can be seen as enriched with MoRF sites. Per-residue disorder profile for proteins – CASC3, PRRC2A, SERBP1, HABP4, EWS, FUS, SAFB2, EIF4B, CCDC9B AND MKI67 are shown in Fig. 7. These graphs suggest the presence of multiple intrinsically disordered regions and remarkable agreement can be seen between outputs of the six predictors. The corresponding Alphafold structures of these 10 most disordered RBPs indicates the presence of intrinsic disorder in all these proteins (Supplementary Figure S5). This is in complete concordance with the predictions by various disorder predictors employed above.
Further, we focused towards analysing the functional associations among the top 10 most disordered RBPs of the SG proteome using the STRING database on highest confidence of 0.9. The 10 proteins (nodes) of this network form do not form any connection and the current set of proteins that are not very well connected (PPI enrichment p-value = 1) (Fig. 8A). This could simply mean that these proteins have not been studied very much and that their interactions might not yet be known to STRING. The average node degree and average local clustering coefficient of this network is 0. Moreover, RNA-binding and SUMO-binding are the major molecular functions associated with this network. We further extended the PPI network to include 50 first shell interactors of these 10 highly disordered proteins. The corresponding functional association network using STRING generated on highest confidence of 0.9 is shown in Fig. 8B. The resulting STRING network has 60 nodes and 282 edges which is significantly more than 99 expected number of edges, suggesting that compared to a random set of proteins of the same size and degree distribution selected from the genome, this set of proteins interacts with one another more frequently. An enrichment of this kind suggests that, collectively, the proteins have some biological connections. The average node degree and average local clustering coefficient for this network is 9.4 and 0.729, respectively. The GO assigned biological processes of this network are nuclear-transcribed mRNA catabolic process, nonsense-mediated decay, mRNA metabolic process, gene expression, RNA catabolic process, translation initiation, RNA metabolic process, cellular nitrogen compound metabolic process, posttranscriptional regulation of gene expression, nucleic acid metabolic process and translation. The most prominent molecular functions associated with this network are RNA binding, nucleic acid binding, heterocyclic compound binding, organic cyclic compound binding, structural constituent of ribosome, mRNA binding, binding, translation initiation factor activity, translation regulator activity and structural molecule activity. Among the cellular components are ribonucleoprotein complex, U-2 type catalytic step 1 spliceosome, cytosolic ribosome, protein containing complex, exon-exon junction complex, ribsosome, ribsosomal subunit, spliceosomal complex, nucleoplasm and nuclear lumen.
The gene expression profile of these 10 most disordered RBPs in context of different neurodegenerative diseases, as obtained from MSGP database are shown in Supplementary Figure S6. In fact, most of the SG components are RBPs, which play a variety of crucial roles in neurons physiology. Therefore, changes in their expression could influence or be the cause of the neurodegenerative pathophysiology. In case of ALS, only EWS have decreased expression while rest all have increased expression. For AD, CASC3 and HABP4 show decreased expression while for HD, SERBP1, HABP4, and FUS show decreased expression. In cased of PD, most of the genes are upregulated also. The expression analysis results thus indicated the increased in expression of most of the RBPs in different neurodegenerative diseases.
Autophagic Proteins
Autophagy is a major catabolic pathway for degradation and recycling of cellular components, is tightly maintained to regulate cellular homeostasis and provides an alternative pathway for stress granule clearance [49]. Defective autophagy pathway has been linked to Parkinson’s disease, Huntington’s disease, amyotrophic lateral sclerosis, multiple sclerosis, frontotemporal dementia, cortical atrophy, glaucoma, epilepsy, diabetes mellitus and cancer. Moreover, recent studies have demonstrated that autophagy pathway heavily relies on dynamic and flexible protein regions and occurrence of intrinsic disorder provides suitable conformation for regulation [50–52]. Furthermore, under stress conditions neurons may interact with their intracellular SGs to degrade large amount of RNA through the autophagy pathway. When the crisis is removed, SGs get eliminated through autophagy and normal translation is restored [5]. However, with persistent cellular stress, c9orf72 mutation [53, 54] results in blocked stress granule-autophagy pathway leading to formation of TDP-43 inclusions in neuronal cytoplasm, a characteristic hallmark of ALS/FTD.
The functional association network of 32 autophagic proteins of the SG proteome is shown in Supplementary Figure S7. The 32 nodes (proteins) are connected by 62 edges (connections), which is significantly greater than 21 expected number of edges expected for a random set of proteins of the same size and degree distribution drawn from the genome. The average node degree and average local clustering coefficient for this network is 3.88 and 0.591, respectively. The GO assigned biological processes are regulation of cellular protein metabolic process, cellular response to stress, regulation of nitrogen compound metabolic process, regulation of translation, regulation of primary metabolic process, regulation of cellular biosynthetic process, regulation of cellular metabolic process, regulation of metabolic process, cellular protein modification process and regulation of cellular macromolecule biosynthetic process. The most prominent molecular functions associated with this network are translation regulator activity, protein serine/threonine kinase activity, protein kinase activity, nucleotide binding, catalytic activity, acting on a protein, purine nucleotide binding, purine ribonucleoside triphosphate binding, adenyl nucleotide binding, purine ribonucleotide binding and enzyme binding. Among the cellular components of this network are cytoplasm, cytosol, translation release factor complex, glutamatergic complex, TORC1 complex, aggresome, lysosome, supramolecular complex, autolysosome and cytoplasmic stress granule.
The PPIDMean –based classification of 32 SG proteins linked to the autophagy pathway identified 5 proteins as highly ordered (PPIDMean<10), 17 as moderately disordered (10 ≤ PPIDMean <30) and 10 as highly disordered (PPIDMean≥30). Figure 9A shows a two-dimensional PPIDPONDR−FIT vs. PPIDMean plot representing the predisposition of 32 autophagic proteins of SGs to intrinsic disorder. The combined CH-CDF plot analysis (Fig. 9B) provided further insight into the presence of intrinsic disorder within the autophagic SG proteome. The 32 autophagic SG proteins are found only in the Q1 and Q2 quadrants of the CH-CDF phase space. Out of these, 18 proteins are found in Q1 and predicted as ordered by both predictors whereas, 14 proteins are predicted as ordered by CH and disordered by CDF, found in Q2. Therefore, only 14 proteins (43.75%) of the autophagic SG proteome are considered having some measurable amount of disorder.
These results are contrasting to the CH-CDF plot obtained for RBPs wherein, abundant intrinsic disorder was predicted in the CH-CDF plot. The CH-CDF plot obtained for the autophagic SG proteome therefore suggests that intrinsic disorder propensity among these proteins is significantly lower.
The 10 most disordered autophagy proteins of the SGs are BAG family molecular chaperone regulator 3 (BAG3; PPIDMean: 83.76, MobiDB: 86.43%), Hamartin (TSC1; 56.30; No predicted disorder entry on MobiDB), Sequestosome-1 (SQSTM1; 54.31, 37.05%), Serine/threonine-protein kinase PAK 4 (PAK4; 51.01; 28.26%), Eukaryotic translation initiation factor 2-alpha kinase 1 (EIF2AK1; 36.69; 38.73%), Protein transport protein Sec24C (SEC24C; 36.34; 29.80%), CUGBP Elav-like family member 1 (CELF1; 34.05; 38.89%), Histone deacetylase 6 (HDAC6; 33.96; 30.29%), Interferon-induced, double-stranded RNA-activated protein kinase (EIF2AK2; 33.153; 10.71%) and Dual specificity mitogen-activated protein kinase 7 (MAP2K7; 30.83; 7.40%) (Table 3).
Table 3
The 10 most disordered autophagy-linked proteins of the stress granule proteome name, their Universal protein resource (Uniprot) ID, molecular function, PPIDmean, and OMIM information for their probable links to human genetic disorders.
Protein Name | Abbreviation | Molecular Function | Uniprot ID | PPIDMean | OMIM Disease |
BAG family molecular chaperone regulator 3 | BAG3 | Adenyl-nucleotide exchange factor activity, cadherin binding, chaperone binding, protein complex binding | O95817 | 83.76956522 | Cardiomyopathy, dilated, 1HH; Myopathy, myofibrillar |
Hamartin | TSC1 | Chaperone binding, GTPase activating protein binding, protein N-terminus binding | Q92574 | 56.30111111 | Focal cortical dysplasia, Lymphangioleiomyomatosis, Tuberous sclerosis |
Sequestosome-1 | SQSTM1 | Enzyme binding, identical protein binding, ionotropic glutamate receptor binding, K63-linked polyubiquitin modification-dependent protein binding, protein homodimerization activity, protein kinase binding, protein kinase C binding, protein serine/threonine kinase activity, receptor tyrosine kinase binding, SH2 domain binding, ubiquitin binding, ubiquitin protein ligase binding, zinc ion binding | Q13501 | 54.31712121 | Frontotemporal dementia and/or amyotrophic lateral sclerosis, Myopathy, Neurodegeneration (with ataxia, dystonia, and gaze palsy), Paget disease of bone |
Serine/threonine-protein kinase PAK 4 | PAK4 | ATP binding, cadherin binding, cell-cell adhesion, serine/threonine kinase activity, Rac GTPase binding | O96013 | 51.01591653 | |
Eukaryotic translation initiation factor 2-alpha kinase 1 | EIF2AK1 | ATP binding, eukaryotic translation initiation factor 2 alpha kinase activity, heme binding, protein homodimerization activity | Q9BQI3 | 36.69179894 | |
Protein transport protein Sec24C | SEC24C | SNARE binding, zinc ion binding | P53992 | 36.34955515 | |
CUGBP Elav-like family member 1 | CELF1 | BRE binding, mRNA binding, RNA binding, translation repressor activity, nucleic acid binding | Q92879 | 34.05430041 | |
Histone deacetylase 6 | HDAC6 | histone deacetylase activity, actin binding, tubulin binding, tubulin deacetylase activity, beta-catenin binding, dynein complex binding, promoter binding, enzyme binding, Hsp90 binding, microtubule binding, misfolded protein binding, polyubiquitin binding, tau protein binding, ubiquitin protein ligase binding, zinc ion binding | Q9UBN7 | 33.96599451 | Chondrodysplasia |
Interferon-induced, double-stranded RNA-activated protein kinase | EIF2AK2 | ATP binding, double-stranded RNA binding, non-membrane spanning protein tyrosine kinase activity, protein serine/threonine kinase activity, protein phosphatase regulator activity, RNA binding | P19525 | 33.15298246 | |
Dual specificity mitogen-activated protein kinase 7 | MAP2K7 | ATP binding, enzyme binding, magnesium ion binding, protein serine/threonine kinase activity, protein tyrosine kinase activity, MAP kinase activity, protein phosphatase binding | O14733 | 30.83772076 | |
Figure 10 shows functional disorder profiles of the 10 most disordered autophagy-linked SG proteins generated by D2P2 platform. The functional disorder profiles indicate that with a few IDRs among the autophagic proteins, only a few MoRF sites can be seen. Per-residue disorder profile for proteins –BAG3, TSC1, SQSTM1, PAK4, EIF2AK1, SEC24C, CELF1, HDAC6, EIF2AK2 and MAP2K7 are shown in Fig. 11. These graphs suggest the presence of multiple intrinsically disordered regions and remarkable agreement can be seen between outputs of these six predictors.
The corresponding Alphafold structures of the 10 most disordered autophagic SG proteins indicates the presence of intrinsic disorder (Supplementary Figure S8). These structures show highly structured regions (in blue) and that is in concordance with disorder predictions by tools employed above and the unstructured regions are few (represented by yellow and orange).
A PPI network of top 10 most disordered autophagic SG proteins generated on highest confidence of 0.9 is shown in Fig. 12A. This functional association network has 10 nodes and 2 edges (connections) which is higher than 0 expected edges. The average node degree and average local clustering coefficient of this network are 0.4 and 0.2, respectively. The GO assigned biological processes are aggresome assembly, positive regulation of macroautophagy, negative regulation of translation, regulation of cellular response to stress, negative regulation of cellular protein metabolic process, regulation of protein phosphorylation, and negative regulation of cellular macromolecule biosynthetic process, cellular response to stress, negative regulation of gene expression and regulation of cellular protein metabolic process. The molecular functions associated with this network are protein serine/threonine kinase activity, eukaryotic translation initiation factor 2alpha kinase activity and translation regulator activity. Among the cellular components are aggresome and chaperone complex.
Next, we extended our PPI network to include 50 first shell interactors of the top 10 most disordered autophagic SG proteins (Fig. 12B). The 60 nodes (proteins) form 199 edges (connections) which is higher than 91 expected number of edges. The average node degree of this network is 6.63 and the average local clustering coefficient is 0.656. The GO assigned biological processes of this network are cellular response to stress, response to stress, positive regulation of molecular function, regulation of protein metabolic process, regulation of cellular protein metabolic process, regulation of cellular response to stress, regulation of protein modification process, positive regulation of catalytic activity, COPII-coated vesicle cargo loading and regulation of phosphorylation. The most prominent molecular functions are ubiquitin protein ligase binding, enzyme binding, protein binding, protein serine/threonine kinase activity, binding, protein kinase activity, purine nucleotide binding, nucleotide binding, purine ribonucleoside triphosphate binding and protein ribonucleotide binding. Among the cellular components are cytosol, COPII vesicle coat, cytoplasm, ER to Golgi transport vesicle membrane, perinuclear region of cytoplasm, whole membrane, intracellular membrane-bounded organelle, cytoplasmic vesicle, intracellular organelle and bounding membrane of organelle.
The gene expression analysis of the 10 most disordered autophagy components of the SG proteome in context of different neurodegenerative diseases, as obtained from MSGP database are shown in Supplementary Figure S9. All of the genes show increased expression in case of AD except EIF2AK2. The expression of genes BAG3 and CELF1 were repressed in case of ALS while for HD, HDAC6 and CELF1 have decreased expression. In case of PD, PAK4, EIF2AK1, SEC24C and MAP2K7 showed downregulation suggesting different roles of autophagic proteins in context of neurodegenerative diseases.