Disorder in C1CRs
The ICDs of the C1CRs have been very sparsely studied structurally, likely because of their high expected abundance of disorder. Moreover, many of the receptors exist in several different alternatively spliced versions, some of which differ in the ICDs. To provide an overview of the ICD isoforms, we conducted a survey of known isoforms of the entire family (Fig. 1c and Table 1). We followed the grouping of the receptors made based on structure and evolution [30,31], but excluded receptors absent in humans and/or for which no ICD could be annotated. Furthermore, IL-2Ra and IL-15Ra were excluded as they lack the structural hallmarks of the family and likely belong to a separate family [105]. This left us with a total of 29 different receptors, distributed with four members in group I (single-chain homodimers), ten members in group II (the gp130 family, not counting receptors binding ancestral cytokines), two members in group III (soluble a-chains, leaving out four receptors without ICDs (IL-27Rb, ciliary neurotrophic factor receptor subunit a (cntfRa), cytokine receptor-like factor 1 (CRLF1), and IL-12Rb), six members in group IV (long-tailed receptor chains) and seven members in group V (short-tailed receptor chains), Table 1. The 29 receptors provided a total of 54 ICD isoforms distributed across all groups. Approximately 40% of the receptors (12 receptors; leukemia inhibitory factor receptor (LIFR), OSMR, IL-6Rb (gp130), IL-27R, IL-6Ra, IL-2Rb, IL21R, IL4Ra, bc (IL-3Rb), CRLF2, gc (IL-2Rg), IL-13Ra1, IL-13Ra2, and IL3Ra) only had one ICD isoform, and three of these were the common receptors IL-6Rb (gp130), bc (IL-3Rb) and gc (IL-2Rg). For the remaining 16 receptors, up to five different isoforms could be identified and nine of the receptors with ICDs > 200 residues had a short isoform < 50 residues. A total of 16 isoforms had unique sequences, typically > 10 residues with an average length of 25±16 residues, and the longest unique sequence of 67 residues belonged to the LEPR isoform C. The average length of longest isoform ICDs were188 residues (group I), 210 residues (group 2), 82 residues (group 3), 335 residues (group 4) and 65 residues (group 5) (Table 1). IL-4Ra had the longest ICD of 575 residues (Table 1). Thus, the C1CR-ICDs are generally long and some have isoforms with unique sequences of considerable length.
The characteristic compositional bias of IDPs makes it possible to predict the degree of disorder in proteins computationally [106]. Almost 10 years ago, computational predictions of disorder was done for five of the C1CRs [107], but disorder predictors have since improved in quality and reproducibility [108], and no study has examined the entire family in unison. We therefore predicted the disorder profiles of the longest ICD isoforms of all 29 C1CRs as well as the propensity of regions to undergo folding-upon-binding using the ANCHOR scores [109] (Fig. 2a and SI Fig. S1). From these predictions, we observed that the ICDs of the entire family have high scores (>0.5) for disorder along their complete sequence and none were predicted to harbor folded domains. Furthermore, almost all receptors had lower disorder scores in the juxtamembrane 20-50 residues, a region overlapping with the JAK1/2/3;TYK2 binding sites. Along the chains, regions of lower disorder propensity were observed, which at the same time were paralleled with high ANCHOR scores. Such signatures suggest the region to be prone to folding-upon-binding and thus constitutes a potential binding site [109]. Indeed, the dip with the lowest disorder score in PRLR-LF-ICD occurred around residue 610, which corresponds to the region of tyrosine phosphorylation by JAK2 (Y580/Y614), and docking site for STAT5 (YLDP). Comparing the profiles across the group 1 C1CRs revealed a similar pattern of disorder along the first 150-200 residues, although the extent of each of the regions with higher/lower disorder would vary (Fig. 2a). Similar group specific profiles of some similarity in the first half of the ICDs were seen for groups 2, 3, and 4, but not for group 5 (SI Fig. S1); an observation likely reflecting their shorter ICDs. Finally, we compared the disorder profiles for the five different isoforms of the PRLR-ICD (SI Fig. S2). Despite the change in sequence, all the ICD isoforms were predicted to be disordered and with almost identical disorder profiles. This is consistent with the general observation that sequence may change in a family of proteins while the disorder-order profile persist [107,110]. In summary, the predicted disorder profiles support that the ICDs of all the C1CRs are disordered (Fig. 2b), and highlight common disorder profiles with a distribution of binding sites prone to folding-upon-binding.
The ICDs of the C1CRs have compositional biases distinguishing them from other IDPs
To address if the C1CRs-ICDs have physiochemical properties that distinguish them from other IDPs, we compared the amino acid content of the entire family (Fig. 2b) as well as the individual groups (SI Fig. S3) to those of folded proteins and IDPs (for details see methods). The analysis revealed that the C1CRs-ICDs indeed have global sequence compositions that stand out from other IDPs in three ways: First, some amino acids are depleted in the C1CRs-ICDs, namely Met, Arg, Ala and Lys, and are less frequent than in general in IDPs and in folded proteins. Second, Cys, Trp, Leu and Val are significantly more frequent in the C1CRs-ICDs than in other IDPs, and are as frequent as in folded proteins (except Val, which is less frequent than in folded proteins). Third, Pro is highly enriched in the C1CRs-ICDs, and is even more frequent than in both folded proteins and IDPs in general. These differences are remarkable, but the role of these global compositional biases in C1CR functionality remains to be understood. The depletion in positively charged amino acids could be related to prevention of detrimental interactions with the negatively charged inner membrane leaflet to which the C1CRs-ICDs are tethered through their TMDs, or with other negatively charged molecules. The enrichment in Cys, Trp, Leu, Val and the over-enrichment in Pro contradicts common observations for IDPs, as often, when found in IDPs, they are part of SLiMs. Indeed, the saturation with SLiMs along the C1CRs-ICD chains, as highlighted in Fig. 3 (see below), suggest similar enrichment in binding sites and can points towards large interactomes. Pros are known to preserve disorder in regions of IDPs with residual structural propensities [111], and hence could counter-balance the effects of the enrichment in hydrophobic residues. Furthermore, the chemistry of Pro causes rigidification of the backbone and consequently conformational expansion, as well as the formation of polyPro type II (PPII) helices by Pro-rich motifs [111]. Finally, several SLiMs and modification sites are Pro-based, including binding sites for JAKs and SH3s, and MAPK modification sites, which may increase the relative content of Pro in C1CR-ICDs.
Although sequence identity is often low among related IDPs, the sequence characteristics important for function are typically conserved, whether it being specific SLiMs, global conformational characteristics or specific functional domains [97]. Thus, regions of specific residue biases can be taken to represent domains of different chemical and structural properties, which may contribute differently to the function of the C1CRs-ICDs. To identify putative functional domains of specific physio-chemical properties across the C1CR-ICDs we submitted the sequences to IDDomainSpotter [97]. IDDomainSpotter reveals distinct conformational biases in regions of long IDP sequences by calculating the fractions of specified residues in a sliding window of 15 residues, meaning, that for each residue k, the fraction of the specified residues between k-7 and and k+7 is given. Here, we have analyzed e.g. the charge composition by setting Lys and Arg as positive contributions (+RK) and Glu and Asp (-DE) as negative contributors (net charge). Hence, a given residue k within the sliding window counts as +1 if the position is a Lys or Arg, -1 if the position is Glu or Asp, and 0 for any other residue.
The IDDomainSpotter analysis of the C1CR-ICDs revealed shared profiles for certain residues across the receptors (Fig. 2c), suggesting functional importance. First, they all shared a region of 10-20 residues in the region immediately following the TMD with a positive net charge, followed by a region of ~50-60 residues (only ~20 residues for the shorter TPOR) with a negative net charge (Fig. 2c, green). We denominate these regions as the positive domain (PD) and the negative domain (ND), respectively. For the ICDs of PRLR and GHR, the PD has been shown to specifically interact with negatively charged lipids of the inner leaflet of the membrane [79]. However, the role of the ND is not understood. The negative charges may be relevant for membrane repulsion, or for compaction with the PD when not membrane bound. Alternatively, it could provide negatively charged flanking regions for specific SLiMs, such as for Box2 binding to SH2 domains or Pro-rich motifs binding to SH3 domains, as recently supported by experiments [112]. Second, ~100 residues and onwards from the TMD, the net charge was close to equally balanced along the chains. Another shared property is the almost equal distribution of the unusually abundant Cys and Pro throughout the chains (Fig. 2c, orange and purple). This could suggest that their abundance is related to global conformational properties, rather than e.g. interaction sites or PPIIs. The ICDs of group 1 further shared a pattern of depletion and enrichment (20-50 residues) of the hydrophobic branched amino acids Ile, Leu and Val throughout their chain (Fig. 2c, red). Such hydrophobic side chains are usually less abundant in IDRs because of the energetic penalty of solvation, and hence in IDRs they are often primarily located in e.g. SLiMs, as mentioned above, but could be related to maintaining extended b-structures of relevance to binding. Finally, clustering of Phe, Tyr and Gly (+FYG) was analyzed, as IDRs enriched in these residues may be involved in liquid-liquid phase separation [113,114], but these were found to be low throughout the chains.
The patterns observed for the group 1 receptors is overall shared across the C1CR-ICDs, with the exception of some noteworthy variations in charge composition. Generally, all the C1CR-ICDs harbor a PD (sometimes in a shorter version), followed by an ND of various length, with subsequent close to net neutral charge along the chains. However, the short ICDs of group 3 only harbor a PD, and lack an ND. Furthermore, the ICDs of IL-31Rb (OSMR) (group 2) and IL-4Rα (group 4) lack regions of substantial net charge throughout their ICDs, including the PD; a trait that may be related to their association with the IL-31Ra and IL-13Ra1, respectively.
Short linear motifs allow expansion of the interactome
In the disorder profiles and the IDDomainSpotter analysis we observed distinct patterns, which may suggest the presence of multiple binding sites [115] (Fig. 2). For all receptors, the first region of low disorder propensity corresponds to the juxtamembrane region containing the most conserved and well-known motifs, Box1 and Box2, involved in JAK/TYK binding. As a Pro-rich motif, Box1 represents one of the most abundant SLiMs in the eukaryotic proteome [116]. The polyPro scaffold inherently provides a conformational bias towards PPII formation [117], which creates a structural predisposition that may drive an interaction via reduced entropic penalty of complex formation [118]. Except for group 5, which lacks canonical Box1 and Box2 motifs, all receptors harbor the conserved PXP motif in Box1 (SI Fig. S4), known to interact with the JAK-FERM domain [41,119]. However, in most receptors, Box1 is further extended to PXXPXP (consensus for group 1: jjPXjPXP, where j is any hydrophobic residue), which thereby accommodates both the minimal SH3 binding motif (PXXP) [120] and the FERM binding motif (PXP) in one combined SLiM (Table 2). This enables competitive binding to Src and JAK kinases [45,121].
In the membrane distal region, SLiMs constituting docking sites for various signaling proteins have been mapped experimentally. These SLiMs are predominantly activated by phosphorylation to recruit SH2-containing proteins, such as those in STAT and SOCS proteins [122,123]. Each group of receptors is known to preferentially recruit a specific STAT for activation [124] and each group therefore contains a specific subset of phospho-tyrosine motifs. Hence, group 1 harbors the STAT5 consensus motif, pYXXL [125], whereas the group 2 harbors the STAT3 consensus motif pYXXQ [126]. In addition to distinct down-stream signaling related SLiMs, many of the experimentally known SLiMs are also related to endocytosis, trafficking and degradation. Some of these are frequent and are well-described motifs experimentally, such as the dileucine-motifs (i.e. [D/E]XXX[L/I] and [D/E]XXLL) seen in LIFR [127] and IL-2Rb [128], and the tyrosine-based motifs (i.e. YXXf), promoting clathrin-dependent endocytosis and internalization, first identified in the TPOR (YRRL) and later in the G-CSFR [129,130]. Additionally, phosphorylation dependent degrons (i.e. DSGXXS) [131–133], promoting ubiquitin-dependent proteasomal degradation are also well characterized in both GHR and PRLR [131–133]. The motifs are summarized in Table 2.
For the longest ICD-isoforms of group 1, we subsequently predicted SLiMs and phosphorylation sites using the Eukaryotic Linear Motifs server (ELM) [103] and the iGPS [104], respectively (see methods), and mapped these to the sequences, marking those already experimentally confirmed (Fig. 3b). We made three important observations. First, the predicted SLiMs as well as their flanking regions, are rich in amino acids that promote extended structures, such as Pro, Val and Leu [134], in accordance with the structures adapted in the bound states [76,135,136], and the compositional analysis made above. Second, from the distribution of SLiMs along the chains, it was evident that clusters of overlapping SLiMs are frequent and distributed across the ICD sequences, interleaved with stretches depleted in SLiMs (Fig. 3b). Clusters of overlapping SLiMs can be considered as scaffolding hot spots where multiple binding events can take place in a controlled manner, largely determined by binding competition, i.e. affinity, concentration and PTMs. However, IDPs may also accommodate simultaneous binding of several partners and thereby orchestrate signaling by bringing relevant proteins into close proximity [137]. As evident from Table 2, similarities between the tyrosine-based motifs are pronounced. Consequently, STAT and SOCS binding motifs may overlap, but also with phosphatase binding sites, such as e.g. for SHP2 [138–140], as well as with tyrosine-based internalization motifs. Thus, regulation of signaling fate by discrimination and availability via compressed motifs appear widespread in C1CRs and is critically linked to properties of disorder. Until recently, C1CR-regions with overlapping motifs have exclusively been characterized in the membrane distal regions. However, accumulating evidence suggest that also the membrane proximal regions contain SLiM-clusters. In GHR, the membrane proximal ~60 residues region contains a LID with an unknown function [79], a ubiquitin dependent degron whereto the E3-ligase, bTrCP docks and promote GHR downregulation [131], as well as JAK2 [48] and Src kinase (LYN) binding sites [45] (Fig. 3c). JAK2 and LYN are the primary kinases in GHR signaling, controlling the activation of JAK2/STAT5 and MAPK pathways, respectively [45]. Both are known to be constitutively associated with the receptor ICD [45,141] and their relative activation of pathways can be perturbed by mutations in the ECD affecting TMD alignment [45]. However, the molecular details of how the change in TMD alignment is associated with pathway selectivity are still unknown, but may be controlled by binding competition of JAK2 and LYN and even further affected by membrane interaction. Similarly, GHR downregulation by bTrCP may likewise by binding competition driven. Thus, this region represents one of the essential composite SLiM-clusters in GHR with hitherto unexplored implications for the regulation GHR signaling.
Typically, multiple binding events in IDPs are often regulated by phosphorylations and can be characterized as binary on/off switches. However, accumulating evidence have revealed that phosphorylations can generate much more complex responses (reviewed in [142]), and multisite phosphorylation can additionally generate sensitive threshold responses as well as graded responses. The third important observation we made was that in the C1CR group 1 receptors, several phosphorylation sites were predicted, but only a small subset of these were well-characterized experimentally. Hence, much remains to be understood in terms of regulation and the many modification sites open the possibility that the C1CR-ICDs can have rheostat regulatory potential [16]. In this way, successive phosphorylations may additively increase (or decrease) binding affinity enabling graded responses, or they may modulate the conformational ensemble, with impact on signaling output. Importantly, multisite phosphorylations, which are functionally relevant for IDPs, remains to be addressed in the C1CRs.
In summary, interactions by C1CR-ICDs are primarily mediated by SLiMs, creating docking and modification sites for several accessory signaling proteins. Furthermore, clusters of overlapping SLiMs dominate the C1CR-ICDs, which together with the structural plasticity provided by the disorder properties, impart a unique condensed and versatile signaling scaffold, enabling establishment of large interactomes, whose content is controlled by the available pool and concentrations of interaction partners. The spatio-temporal orchestration of signaling therefore rely on availability of the binding partner, affinities and kinetics and altogether eventually determine signaling fate.
Comparison of two PRLR isoforms – maintaining binding and dynamics
In order to investigate different C1CR-ICD isoforms at the molecular level, we expressed and biophysically characterized PRLR-SF1b (residues 259-288) and compared it with PRLR-LF-ICD. The sequence of PRLR-SF1b-ICD is much shorter (32 versus 386 residues) and differs in the last three C-terminal residues. Thus, K286, G287 and K288 of PRLR-LF-ICD are substituted by V286, T287, and P288 in the short isoform, with P288 constituting the new C-terminus. Apart from the loss of multiple interaction sites by being shorter, including the loss of Box2, the chemical change from net positively charged to uncharged with more hydrophobic residues may influence the structural preferences as well as the interactome, including membrane binding. The far-UV CD spectrum of PRLR-SF1b-ICD showed no indications of substantially populated secondary structure, with little negative ellipticity in the 222-210 nm range, and large negative ellipticity around 200 nm (Fig. 4a), which is a characteristic signature of a IDPs. The largely disordered nature of PRLR-SF1b-ICD was further corroborated by a low dispersion in the 1H-dimension of the 1H-15N-HSQC NMR spectrum (Fig. 4b), indicating a homogeneity of the chemical environment of the backbone amides. One way to detect transient secondary structures is to measure Cα NMR chemical shifts to derive the secondary chemical shifts (SCSs) (see methods). Generally, consecutive positive SCSs indicate a-helical structure, whereas negative SCSs indicates extended structure [143,144]. The SCSs of the Cα of PRLR-SF1b-ICD (Fig. 4c) were overall slightly negative, with positive values only for A281 and H282. These constitute the N-terminal of the first transient a-helix, TH1 previously identified in the PRLR-LF-ICD (Fig. 4d). However, where PRLR-LF-ICD had positive SCSs for F279-T287, the PRLR-SF1b-ICD only had positive SCSs for A281 and H282. For residues N-terminal to TH1, the negative SCS signature remained similar for the two isoforms (Fig. 4e). This difference strongly indicates that the three unique C-terminal residues V286, T287, and P288 in PRLR-SF1b-ICD distinctly lowered the a-helical content compared to PRLR-LF-ICD, while not changing the structural properties of the remainder of the sequence. However, similar to changes in structural propensities, changes in dynamics can affect how an IDP is presented to a binding partner. We therefore compared NMR parameters that report on fast-time scale backbone dynamics (ns-ps) measured on PRLR-SF1b-ICD to those of the first 100 residues of PRLR-LF-ICD [32] (Fig. S5). The average relaxation rates of PRLR-SF1b-ICD were similar but slightly lower than for PRLR-LF-ICD [32], and from a comparison to a segmental-motion model of a completely unstructured chain, similar fast timescale dynamics was seen over the length of the peptide (SI Fig. S5b, red line) [145]. Last, a plot of R2/R1 revealed no clear indication of non-random structure throughout PRLR-SF1b-ICD (Fig. 4f). Thus, but isoforms are disordered and have fast dynamics similar to a completely unstructured chain in accordance with the disorder predictions. Remarkably, TH1, observed in the PRLR-LF-ICD was eliminated in the PRLR-SF1b-ICD, demonstrating that two isoforms differing only in three residues may have different structural propensities.
PRLR-SF1b-ICD includes part of the first LID, and may thus interact with the membrane, or, as two positively charged residues are substituted, membrane interaction could be abolished. To address this, 1H-15N-HSQC spectra were recorded of PRLR-SF1b-ICD in the presence and absence of small unilamellar vesicles (SUVs) consisting of 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine/1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (POPC/POPS) and chemical shift changes and intensity ratios were compared (Fig. 5A,B). The peaks corresponding to residues K259-F268 disappeared upon addition of POPC/POPS SUVs and chemical shift changes were observed for residues I276-L283 (Fig. 5a), indicating interaction with the SUVs. Furthermore, the low intensity ratio across the entire sequence suggested that the NMR observables for all of PRLR-SF1b-ICD were affected (Fig 5b). Thus, the lipid interaction observed for PRLR-LF-ICD [79] was preserved in PRLR-SF1b-ICD, despite structural and chemical changes. This is in accordance with the lack of induced α-helicity in PRLR-LF-ICD upon lipid binding [79].
To investigate if the cis/trans propensity of the prolines differed between PRLR-LF-ICD and PRLR-SF1b-ICD, we utilized the backbone chemical shifts and sequences of both PRLR-LF-ICD and PRLR-SF1b-ICD to calculate the probability for cis conformation [146]. Even in IDPs, prolines are only in cis conformation in ~5% of all proline peptide bonds [147,148] and for PRLR, only P269 was observed to have a slightly above average probability of being in cis for both isoforms with an absolute cis proline probability of 8-10% (SI Table S1). In contrast to previous suggestions of cis/trans probabilities in an 8-mer Box1 peptide focusing on P272 (I267-P274) [81], our results suggested a very low probability of P272 cis conformation of 0.01 (1%) for both isoforms. This is in line with recent work demonstrating that very short peptides strongly overestimate the amount of cis-Pro, supporting previous overestimation of the cis probability in the PRLR Box1 peptide [148]. Box1 was previously suggested to interact with CypA [149]. To investigate if PRLR-LF-ICD and PRLR-SF1b-ICD would interact with CypA, 1H-15N-HSQC spectra in the presence and absence of unlabeled CypA were recorded and chemical shift differences calculated (Fig. 5c,e). Weak NMR peaks originating from the lowly populated cis state of P269 were present in the spectra, and as CypA may be able to recognize either states, we investigated both peak sets. No changes were observed for PRLR-LF-ICD nor PRLR-SF1b-ICD, neither in the chemical shifts nor in the intensity ratios (Fig. 5d,f), suggesting that CypA are not interacting with the PRLR-ICD isoforms. Thus, Pro cis/trans isomerization appears not to be dominant in the free state of the PRLR, which is also reflected in the absence of CypA interaction.
Network rewiring by isoforms
From the comparison of the two isoforms of PRLR, we only observed small differences in dynamics with preserved lipid binding, although structural differences between the two were prominent. However, a major reduction in the interactome will take place when removing 90% of the ICD, which removed numerous SLiMs and phosphorylation sites, including docking sites for STAT and SOCS. On the other hand, ICD isoforms with longer stretches of unique sequence may have gained additional interactions sites. For the C1CRs with different ICD-isoforms, we predicted which common binding-SLiMs were gained or lost, disregarding potential phosphorylation sites and receptor-unique SLiMs (Table S2). The unique sequences were found to carry distinct SLiMs. In the case of group 1, a 14-3-3 binding SLiM has previously been identified in PRLR isoform 1 [150]; a SLiM originally discovered active in cytokine receptors in the IL-9R [151], but predictions suggested the presence of an additional 14-3-3 motif. Compared to the PRLR-LF-ICD, however, the short forms did not include the STAT-docking site or the original 14-3-3 binding SLiM. Instead, a different 14-3-3 binding SLiM was present in the unique sequences (SF1a and SF1c) (Fig. 6a). For TPOR, isoform 2 had two 14-3-3 SLiMs, which were absent in isoform 1, while the unique sequences of GHR and EPOR isoforms, which do not have any known 14-3-3 binding SLiMs, were much shorter and without any predictable common SLiMs. Thus, for PRLR, the preservation of the 14-3-3 SLiM suggest a key regulatory function, one of which may be to attenuate receptor signaling, as suggested [150]. Relevantly, the LFs of EPOR, GHR and PRLR all had a phosphorylation dependent degron, interacting with the SkpSCF-betaTrCP1 complex or the Skp1_Cullin-Fbox, leading to ubiquitylation and degradation, as shown experimentally for PRLR and GHR, where it negatively regulates receptor stability [131,132]. These were not identified in any of the shorter isoforms, which also have been seen to be stabilized on the membrane [152], perhaps because of the lack of associated proliferative signaling and hence lack of need for immediate down regulation.
For other receptors, e.g. the IL-31Ra (GLMR) and LEPR isoforms, we found that unique sequences introduced PDZ binding SLiMs at the new C-termini (Fig. 6b). Furthermore, when present, each IL-31Ra (GLMR) receptor isoform had a unique PDZ binding SLiM, allocating the isoforms to interact with different classes (Class 1,2 and 3) of PDZ domains (Fig. 6b) [153,154]. The same was true for the LEPR (Fig. 6c). In fact, the introduction of a PDZ SLiM in the C-terminus in one isoform, absent in the other, was observed for several receptors including G-CSFR, IL-7Ra, IL-9Ra, and GM-CSFRa (Table S2). Why these isoforms need PDZ binding motifs is not clear, but several scaffolding proteins with specialized subcellular localization and tissue specificity exist, known to contain multiple PDZ domains by which they orchestrate supramolecular complexes. Binding of the IL-5Ra-ICD to a PDZ domain from syntenin (Fig. 1b) [78] supports the involvement of further scaffolding proteins for formation of larger signaling complexes. PDZ containing protein may be of relevance to the C1CRs and could engage proteins from the NHERF and PSD-95 families [155], which also scaffold kinases as Fyn [156]. Alternatively, E3-ligases belonging to the MARCHs family coordinate binding via PDZ domains and are relevant for ubiquitylation of proteins in the intracellular membranes [157]. However, besides the complex between the IL-5Ra-ICD and the PDZ domain from syntenin, complexes of class 1 cytokine receptors with PDZ domains remain to be experimentally explored. Finally, for all receptors with isoforms, the longest isoform carries the interaction with STATs, either STAT5 or STAT3 or both, but, additionally also carry a binding motif for TNF receptor-associated factor (TRAF)-2 or TRAF-6, none of which are found in other, shorter isoforms. In a few cases, the STAT and/or TRAF motifs are maintained in the second longest isoform, and sometimes a shift between STAT5 and STAT3 or between TRAF-2 and TRAF-6 occurs.
Thus, for the C1CRs, the disorder predictions and experimental characterization of selected representatives have suggested that the isoforms maintain structural disorder, and their presence suggests several mechanisms by which disorder orchestrates signaling. The first is the complete removal of a large part of the ICD, eliminating SLiMs typically for STAT activation, TRAF-interaction and downregulation by degradation via degron activation. In this way the shorter isoforms act as negative regulators, or decoy receptors, of signaling, as seen for the short forms of the PRLR and GHR [59,152]. However, these isoforms still maintain binding capacity as seen from for membrane binding of Sf1b above. The second mechanism by which isoforms orchestrate signaling is via rewiring of the interactome to access completely new networks, exemplified by the addition and removal of binding sites for e.g. 14-3-3 proteins and PDZ domains. This allows for different signaling profiles dependent on expression profiles of the C1CR-ICD isoforms. However, more studies into the network rewiring of the C1CRs are warranted, and the analysis made here provides a starting point.
The conformational ensemble of C1CR-ICDs
IDPs are functional without taking on a single, well-defined tertiary structure. Yet, they cannot adequately be described as simple statistical coil chains equally populating all possible conformations allowed by their backbone torsion angles. Instead, IDPs display varying degrees of compaction and elongations, and contain transient, short- and long-range structural organizations. Hence, the disorder of the C1CR-ICDs not only infer flexibility and high accessibility of binding sites, but certain chain dimensions and spatial organizations may influence the organization of the signaling complexes and orchestration of protein interactions, and in the end, signaling outcome. Currently, the conformation and dimensions of IDPs cannot be quantitatively predicted from sequence [158,159]. Nonetheless, the balance between chain-chain and chain-solvent interactions that determines the conformational preference is related to specific sequence features that influence the conformational ensembles in predictable ways [98,158,160,161]. One set of these relates to global compositional sequence features (i.e. parameters that are independent on the sequence order) and the fraction of charged residues and the net charge per residue are particularly important [160,162]. In addition, features relating to sequence patterning, especially the patterning of oppositely charged residues and expansion promoting residues, influence compaction [159]. However, the current difficulties in consistently predicting the conformational ensemble of all IDPs reflects that these behaviors are encoded in sequence features yet to be unraveled.
IDPs have been classified into five compositional groups in a diagram of states [162] based on their fraction of positively charged residues (f+) and fraction of negatively charged residues (f-). These two global parameters are combined into two measures underlying a diagram of states: the fraction of charged residues (FCR = f+ + f-) and the net charge per residue (NPCR = f+ - f-). An explanation of the relation between these parameters and the properties of the chain is given in the supplemental data. Of 879 IDRs longer than 15 residues found in DisProt, CIDER [98] classified 40% as belonging to R1, 35% to R2, 22% to R3, and 3% to either R4 or R5 [158]. For each C1CR-ICD, the sequence of isoform 1 was submitted to CIDER [98], except for LEPR and G-CSFR, for which isoform B and 3, respectively, were selected as these were the longest isoforms (see Table 1). For GM-SCFRα, both isoform 1 and 2 were analyzed because they differed in more than 50% of their C-terminal sequences (see Table 1). The C1CR-ICDs generally fell close to the boundary between R1 and R2, with most belonging to R1 (61%) (Fig. 2a), suggesting a preference for compact, but still dynamic, heterogenous conformational ensembles [158]. Nonetheless, in particular for sequences belonging to R2, their overall charge neutrality means that their conformational preference cannot be predicted from global composition alone [158,160]. Furthermore, it should be noted that the boundary between R1 and R2 has been determined ad hoc, and has been suggested to be positioned at lower FCR for longer sequences [158,160]. Furthermore, for ICDs > 100 residues or with a high proline fraction (> 0.15), no qualitative prediction of the conformation can be made for sequences of R1, as these tend to have more extended conformations than their scores predict.
Since almost all the C1CR-ICDs are long IDPs of R1 and R2, the conformational preferences cannot be predicted from global composition alone but may also be influenced by e.g. sequence patterning. Particular the patterning, or mixing, of oppositely charged residues is important, as well as expansion driving- and aromatic residues. The parameter κ reports how well positively and negatively charged residues are segregated across the sequence and is normalized between 0 and 1, with κ close to zero representing sequences with evenly distributed charges, while sequences with κ close to 1 have highly segregated charges. It has been shown that as κ approaches 1, the conformational ensemble becomes more compact [162]. However, since κ is calculated by normalizing to the most segregated sequence within the given composition, a specific κ value will not have the same meaning for two sequences with different FCR and |NCPR| values. Furthermore, for long IDRs, such as most of the C1CR-ICDs, κ is calculated only within a window of 5 and 6 residues, ignoring long-range effects. κ is most informative for sequences with an FCR above 0.25 and NCPR between -0.1 and +0.1, for which a κ below 0.12 is considered low and a κ above 0.25 is high. Especially for polyampholytic sequences with an FCR beyond 0.4, charge patterning is predicted to have a major impact on the conformation [161]. There is one such example, namely the GM-CSFRa, which has an FCR of 0.41, an NCPR of -0.07 and a κ of 0.25, suggesting chain compaction.
The position of the far majority (94%) of the C1CR-ICDs in R1 and R2 is a consequence of their low net charges. Their FCR values are in the mediocre range of 0.1<FCR<0.3 [161], while at the same time, their NCPR is close to 0, demonstrating that they are near-symmetrical polyampholytes. For the C1CR groups with long ICDs (1, 2 and 4), the group average FCRs (0.21; 0.23; 0.19) and NCPRs (-0.06; -0.06; -0.05) are remarkably similar, suggesting that charge properties are a conserved trait. The similarity of these parameters also allows us to compare their κ values more directly, going from a group average of 0.20 for group 1, 0.18 for group 2 to 0.22 for group 4. This is consistent with the IDDomainSpotter analysis presented earlier (Fig. 2c). Here we found that almost all of the C1CR-ICDs harbored a PD immediately following their TMD, succeeded by an ND, and with net charge neutral regions for the remainder of the chain. Together, this suggest that the influence of the global charges and the charge patterning on the conformational ensembles are consistent throughout group 1, 2 and 4, except for IL-31Rb (OSMR) (group 2) and IL-4Rα (group 4). As mentioned, the shorter ICDs of group 3 and 5 result in somewhat different global charge properties.
The Ω parameter both describes the patterning of the charged residues as well as of proline. Like for κ, Ω is normalized between 0 and 1, with Ω close to zero representing sequences with evenly distributed charges and prolines, while sequences with Ω close to 1 have highly segregated charges and prolines [163]. It has been shown that when Ω approaches 1, the preference for expanded conformations increases [163]. A high fraction of Pro (> ~15%) may cause more expanded conformations as Pro prefers to be solvated and promotes stiffness. Five of the C1CR-ICDs had a high fraction of Pro: IL-27R, IL-6Rα, IL-11Rα, βc, IL-13Rα2 (Fig. 7a, top). The amino acid fraction and IDDomainSpotter analysis (Fig. 2) revealed that the Pros are unusually abundant in the C1CR-ICDs and close to equally distributed. From the CIDER analysis we found that Ω, like κ, is similar for many of the C1CR-ICDs, but is lower for the shorter sequences (Fig. 7a). This could simply be a consequence of the Pro-rich Box1 sequences, leading to relatively higher scores in the shorter sequences.
To summarize, the theoretical analysis of the C1CR-ICD sequence parameters known to influence compaction suggest that many of them may be similarly biased towards a specific degree of extension or compaction of their conformational ensembles, but that this degree cannot be predicted from sequence. Hence, to determine this bias, we experimentally investigated the degree of compaction and its responsiveness to salt by SAXS using two long, representative C1CR-ICDs, namely that of PRLR-LF-ICD (Fig. 7b,c) and GHR-LF-ICD (SI Fig. S6). The SAXS profiles of the PRLR-LF-ICD were consistent with those expected for fully disordered proteins, and was fitted to an Rg of 57.3 ± 1.4 Å in 20 mM phosphate buffer. The predicted Rg of the PRLR-LF-ICD for a fully random coil state is, according to Kohn et al. [95], 65 Å, suggesting that the PRLR-LF-ICD in isolation populate a slightly compacted ensemble. The pair distance distribution function (P(r)), which is a histogram of distance distributions within the protein, peaks at ~45 Å and has a Dmax of ~200 Å. Increasing the concentration of salt to 300 mM did not significantly affect the fitted Rg nor the P(r) distribution (Fig. 7b,c), suggesting that the global degree of compaction of PRLR-ICD is not sensitive to salt, as otherwise often observed for more charged IDPs [164], perhaps related to the high content of Pro and branched amino acids. The same trends were observed from SAXS on the GHR-LF-ICD, having a similarly slightly compacted ensemble that was insensitive to salt (SI Fig. S6). Hence, the ICDs across the C1CR family having similar global charge properties and patterning (Fig. 7a), may populate similarly compacted ensembles, although this remains to be experimentally more broadly verified.
Versatile and controlled orchestration of signaling by unique structural disorder in C1CRs
It is remarkable that the entire family of C1CRs, differentiating into > 50 isoforms, are all predicted to be disordered in their entire ICD sequence. Nonetheless, the disordered ICDs are critically understudied, leaving us with a naive and too simplistic schematic view of the ICDs as passive strings of varying lengths with kinases constitutively attached. In the present paper we have highlighted the properties linked to disorder responsible for controlling the diverse signaling by C1CRs (Table 2) and asked: Why has disorder been selected for governing intracellular C1CR signaling? Their complete disordered nature stands in contrast to the majority of other types of single-pass transmembrane receptors such as the receptor tyrosine kinases, where intracellular signaling is mainly governed by intrinsic kinase activity. We have here shown that the long disordered ICDs of C1CRs are brimmed with clusters of multifunctional SLiMs throughout their length, suggesting that one explanation is the signaling versatility and scaffolding capacity of this type of ICD. Furthermore, we have outlined that overlapping SLiMs are prevalent in the C1CR-ICDs, hinting that the disordered ICDs further allow for complex regulation of diverse signaling through competition and regulation of interactions with a plethora of different binding partners through multispecificity. Thereby, activation becomes dependent on the coupled equilibria and kinetics of two (or more) binding events. Indeed, the ability of a distinct region in the disordered ICD to bind to many different proteins is facilitated by structural adaptation and folding-upon-binding [165,166]. Additionally, the C1CR-ICDs, are hot spots for multiple phosphorylation events of which only a few a well-characterized as binary on-off switches. This directly – or indirectly – affect affinities and additionally expands the number of states accessible by the chain at any time. We have further suggested that an additional layer to this regulation is added by the existence of different C1CR-ICD isoforms, in which entire groups of SLiMs can be eliminated and new ones added, a feature that is much easier for IDPs to successfully obtain during evolution compared to folded proteins. By controlled expression of the isoforms, a complete rewiring of the interaction network can be done. Hence, the full disordered nature of the C1CR-ICDs allows for a fascinatingly versatile and complex interaction hub.
Can such signal complexity be facilitated through a simple string with kinases attached? Our sequence analysis and experimental studies have revealed biases in the C1CR-ICDs that differentiate them from being simple statistical coils. They have conserved distinct compositional biases that differentiate them from other IDPs; biases that are distributed throughout their chains, including the presence of disordered domains of specific physiochemical properties. This suggests that these compositional biases are representing shared functionalities yet to be characterized. Our experimental SAXS data on the long ICDs of the archetypical receptors GHR and PRLR revealed that they are slightly more compacted than expected for a fully random coil (~57 Å versus 65 Å for PRLR-LF-ICD). This indeed suggests an inherent conformational bias based on the conservation of certain sequence properties maintained across the family. Importantly, however, it should be kept in mind that in the cell, the C1CR-ICDs are most likely never completely void of interactions at any point. Previous characterizations of PRLR-LF-ICD and GHR-LF-ICD have revealed the presence of LIDs, further suggesting distinct organizational features at the membrane interface where also many kinases are tethered. Additionally, we found that the same sets of SLiMs are placed differently in the C1CR-ICDs with variable distances, which may provide an additional tuning of the signaling outcome, both via the length of their disordered spacers as well as the properties of these [167]. Thus, SLiM organization within the chain may imprint different affinities in different complexes despite their exploitation of identical SLiMs, providing an additional layer of the spatio-temporal orchestration of signaling.