An online GPCR structure analysis platform

We present an online, interactive platform for comparative analysis of all available G-protein coupled receptor (GPCR) structures while correlating to functional data. The comprehensive platform encompasses structure similarity, secondary structure, protein backbone packing and movement, residue–residue contact networks, amino acid properties and prospective design of experimental mutagenesis studies. This lets any researcher tap the potential of sophisticated structural analyses enabling a plethora of basic and applied receptor research studies. An online and interactive G-protein coupled receptor (GPCR) structure analysis platform allows any researcher to analyze and visualize a plethora of structure–function relationships across the scales of atomic interactions to protein backbone rearrangements.

constrictions. All quantitative values in a structure set represent means from all receptors and their individual structures, avoiding skewing of comparisons. The visualizations are interactive plots tailored to the type of data: Transmembrane helix 1-7 (TM1-7) segment movement (two-dimensional (2D) and three-dimensional (3D) segment plots), receptor segment contacts (Flareplot, 2D and 3D network), residue contacts (Flareplot, heatmap, 2D and 3D networks and 3D structure), residue contact frequencies (box plot) and residue properties (box plot, heatmap, scatter plot, snakeplot and 3D structure). Altogether, this tool allows for swift yet powerful analysis of distances, movements, topology, distributions and differences to identify correlations across the macro-to micro-scales, that is, from TM helix bundle contacts to residue backbone kinks, sidechain rotamers and atomic interactions.
The 'Structure similarity trees' (https://review.gpcrdb.org/struc-ture_comparison/structure_clustering) allow for conformational clustering of any GPCR structures through an exhaustive all-to-all Cα-Cα distance pair comparison across structurally equivalent residue positions (~24,000 distances/receptor) and average linkage clustering (Methods). This technique is independent of sequence similarity and the biases of traditional 3D alignment methods with root mean square deviation values, which vary depending on template and superposition region 12 . Clustering receptors based on their conformation instead of sequence aids correlation to receptor function, which is further facilitated by mapping of state, endogenous ligands and G proteins (Fig. 2).
To show the utility of the platform to uncover functional determinants, we applied the Structure comparison tool to list residues that stabilize receptors in an inactive/active state and suggested mutations in a 'State-stabilizing mutation design tool' (https://review. gpcrdb.org/mutations/state_stabilizing, Extended Data Fig. 1a). For each receptor and state, two complementary rationale are presented: removing residues stabilizing the undesired state (alanine mutation) or introducing residues stabilizing the desired state (consensus amino acid from the structures in the desired state). By comparing the overlap of residue positions with those of literature ligand activity-altering and structure construct mutations stored in GPCRdb 10,11 (>35,000 data points), we find that state determinants have more functional receptor expression, ligand activity (binding and/or efficacy) and thermal stability data than non-determinants (Extended Data Fig. 1b,c). Although this Brief Communication presents an online platform only, we invite the wider community to also deposit new mutagenesis results via the standardized spreadsheets for ligand activity and structural biology experiments in the GPCRdb research hub.
Prime to all comparative structure analysis is selection of the best templates based on quality, diversity and representativeness of the given function or property of interest. The template selection interface inside the Structure similarity tool facilitates this by presenting: An online GPCR structure analysis platform Albert J. Kooistra 1,3 ✉ , Christian Munk 1,2,3 , Alexander S. Hauser 1 and David E. Gloriam 1 ✉ (1) receptor classification by classes (evolutionary relationships), ligand type (for example, peptide or lipid receptors) or receptor families (sharing endogenous ligand); (2) completeness (percentage of sequence); (3) species and homology to the human protein; (4) structure determination method and resolution; (5) receptor activation state; (6) ligand and its modality; (7) signal protein family or subtype; and (8) auxiliary fusion proteins or antibodies. Structures are updated monthly to mirror the Protein Data Bank 13 and authors can add their new structures via collaborations exploiting the analysis tools towards publication. Given the >500 GPCR structures (mean six per receptor), which are increasing rapidly, this annotated reference source will facilitate selection of the most appropriate structural templates for a range of scientific studies.
The single biggest factor affecting the receptor structure conformation is the activation state. Consequently, misleading conclusions arise from discrepancy between pharmacological states defined by the ligand modality and presence of an effector G protein, and conformational states predominantly based on the outward movement of TM6 opening the effector site upon activation 4,6,14 . For example, (1) agonist-stimulated structures without an effector G protein, (2) effector site opened by allosteric modulators, (3) fusion proteins moving cytosolic TM6 and (4) helix 8 repacking to transmembrane helices. Furthermore, the TM6 activation switch behaves differently across GPCR classes 5,14 .
To address these problems, we present a 'degree active' percent measure based on overall similarity with reference structures with consistent pharmacology and structural integrity, and a classification of all GPCR structures into an inactive, intermediate, active or 'other' conformational state (Methods, https://review.gpcrdb.org/ structure). This should aid all researchers to correctly characterize the conformational state of ambiguous structures to avoid artifacts in the structural basis.
Taken together, the online platform makes sophisticated comparative structure analysis accessible to a broad research community. It features unique classification of receptors, structures states and techniques. The Structure similarity trees based on all-to-all Cα atom distance pairs 15 , unlike superposition approaches comparing root mean square deviation values, allow for more consistent The structural templates include all GPCr structures that can be analyzed for variability in one set or differences between two sets. results can be filtered to identify functionally critical residue contacts, residue backbone and sidechain movements, residue properties and helix types, bulges and constrictions. The tool includes >20 tailored visualizations.
comparisons by avoiding the uncertainty of which template to use and the difficulty of identifying the superimposable substructure that did not move (if any). Unlike previous resources 3, 16,17 , the Structure comparison tool can compare a group or two sets of structures with respect to backbone secondary structure and residue properties-and visualize results in over 20 interactive plots. Furthermore, the contact percent frequencies presented herein are directly interpretable (unlike the only other available score 4 ) and allow identification of determinants differing in opposite structure sets (ref. 17 only supports one set and ref. 16 cannot compare structures). This tool's built-in anti-skewing averaging of quantitative properties (Methods) eases use and, together with the extensive target selection table including, for example, resolution cut-offs, makes results more robust. Finally, the State-stabilizing mutation design tool has the advantage that it identifies functional determinants, not based on individual pairs 4,6,16,17 , but on the net sum of all contacts to other residues. Therefore, the platform could be applied to uncover determinants of, for example, constitutive activity and ligand-dependent biased signaling, allosteric modulation, efficacy or modality, and will grow even stronger as structural biology continues to advance.

online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41594-021-00675-6. G h re li n (6 K O 5 ) Structures as subsequently grouped based on the acquired all structures-against-all structures distance matrix using average linkage clustering. Tree nodes have gray scale coloring illustrating the separation of clades based on a silhouette score 18 , which represents the score for all structures under this node compared with all structures under the next closest node. This score span quantifies the separation from −1 (wrong, light red) via 0 (non-significant, white) to 1 (perfect, black).

Generic residue numbering.
Corresponding residue positions in each class were indexed with the structure-based GPCRdb generic residue numbering system 24 . This builds on the sequence-based generic residue numbering systems for class A (Ballesteros-Weinstein), B1 (Wootten), C (Pin) and F (Wang), but preserves gaps from a structural alignment of two receptors, caused by a unique helix bulge or constriction, in the sequence alignment, thereby avoiding offset of such and following residues. All schemes assign residue numbers relative to the most conserved amino acid residue, which is given the number 50, and prefixed with the TM helix number (for example, 3 × 32 is on TM3 and 18 positions before the reference). This generic residue numbering scheme also uniquely indexes helix 8 and structurally conserved loop segments, which are numbered by the preceding and following TM helix (for example, 45 is ECL2 located between TM4 and TM5).

Structure comparison tool.
We created an extensive integrated online tool for comparative GPCR structure analysis (https://review.gpcrdb.org/structure_ comparison/comparative_analysis). Its 'Structure selection table' was developed to extend the annotation in the GPCRdb structure browser with additional data aiding selection of representative templates for GPCRs and activation states. Integration with other GPCRdb analysis tools and stand-alone resources was implemented by import and export PDB codes. Separate analysis modes were developed for analysis of a single structure, structure set or structure set pair. We generated separate data browsers for: (1) contact position pairs, (2) contact position-AA pairs, (3) residue backbone and sidechain movement and (4) residue helix types, bulges and constrictions. The data browsers present comprehensive data on contact, helix and residue properties (Supplementary Table 1). To mitigate skewing upon comparison of sets of few versus many diverse structural templates, each quantitative value in a structure set was calculated based on the mean from all receptors, each of which is in its turn represented by a mean value from all its structures. For angles we use a mean of circular quantities. By contrast, values are not averaged across structure states (inactive, intermediate, active and other), which can instead optionally be separated into two different structure sets before analysis. Integrated into the tool, we developed >20 interactive visualizations, for example, snakeplot (topology), scatter plot (correlations), box plot (distributions and differences), heatmap (distances or movements) and structure (movements).

State-stabilizing contact determination.
We developed two browsers, for any and specific amino acid combinations, respectively, for residue-residue pair contacts in our webserver for comparative structure analysis along with plots to visualize these in a 3D structure 22 , Flareplot 17 , network (2D and 3D, adapted from https://d3js.org) or heatmap. Contact definitions and defaults for intra-and intersegment sidechain and backbone contacts are explained in the settings menu of the webserver. For each residue in a receptor structure, non-hydrogen atoms in close proximity with non-hydrogen atoms from neighboring residues are taken into account. These potential contacts are further evaluated based on atom and residue types and their distance. For each of the contact types, the default maximum distances are ionic (4.5 Å), polar (4 Å), aromatic (stacking 5.5 Å and cation-π 6.6 Å), hydrophobic (4.5 Å) and Van der Waals contacts (1.1 times their combined Van der Waals radii). Depending on the chosen settings, contact angles for polar and aromatic contacts are also taken into account.
Transmembrane helix rearrangement plots. We developed a tool and 2D plot for TM1-7 segment movement at the extracellular end, cytosolic end and membrane mid, respectively, in our webserver for comparative structure analysis (https://review. gpcrdb.org/structure_comparison/comparative_analysis). TM1-7 helix axes are defined based on the three most terminal (of the two ends) or membrane mid generic residue positions estimated from the average GPCR structure placement in the membrane according to the Orientations of Proteins in Membranes database 25 . The TM helix rotation is calculated as the difference in structure set 1-2 (here inactive and active templates) of the average angle between: (1) a line from the TM1 (least moving TM) axis at position 1 × 46 (class A residue number, located near the middle of TM1 with respect to the membrane), through the 7TM bundle axis, calculated as the average axis through all TM helices through the midpoint of the receptor bundle; and (2) lines from the axis of the given TM helix through each Cα atom of the three above residue positions. The representation of the seven TM helices is projected onto a plane where the normal is the average of the vectors from each of the seven TM helix axes.
Snakeplot topological mapping of determinants to functional sites. We developed a snakeplot mapping contact residue positions. We integrated all ligand (proteins, peptide, small molecules, and so on) and Gα protein interacting positions from all 488 structures released before 1 November 2020. These new data points covered 408 and 125 residue positions, respectively, for the GPCR superfamily. Ligand binding positions were considered orthosteric and allosteric if above and below the membrane mid (using Orientations of Proteins in Membranes database, see above), respectively, except in class C where all such positions in the 7TM are allosteric because its orthosteric ligands bind exclusively in the N-terminal domain. For positions with both an allosteric ligand and G protein interaction, precedence is given to the latter in the snakeplot data mapping.
State-stabilizing mutation design tool. The 'State-stabilizing mutation design tool' (https://review.gpcrdb.org/mutations/state_stabilizing) lists residues with the most frequent state-specific contacts. For each residue determinant, the tool provides a rank as well as one or more suggested amino acid(s) to remove or repel the given interaction. The suggestions are based on residue contacts observed in inactive versus active state structures of a GPCR class, while uniquely ranking each residue position by all its contacts instead of a single residue pair. For each GPCR class, the 30 residue positions suggested for mutagenesis have the largest difference in contact frequency sums (inactive versus active state) and are therefore hypothesized to stabilize a single state the most. For each receptor and state, two complementary rationale are presented: (1) remove residues stabilizing the undesired state (alanine mutation), and (2) introduce residues stabilizing the desired state (consensus amino acid from the structure in the desired state). Together, this allows the fine-tuning of receptor activity by exploiting state determinants that stabilize only a single state (not both) and to (re-)introduce consensus amino acids that form many state-specific contacts in the GPCR class but are missing in the receptor of interest. For each suggested residue mutation position, the tool also presents already known experimental mutation effects. This includes literature mutations with effects on ligand activity (>34,648 data points) and structure construct mutations (540 data points) that affect primarily thermostability. It also includes receptor expression-increasing mutations that are subsets of these two datasets (483 and 173 data points, respectively). For each mutant position, the overall supporting data types are presented as the sum of: (1) presence of ligand activity-altering mutations (more than fivefold effect in at least two receptors), (2) thermostabilizing mutations and expression-increasing mutations from (3) structure constructs (non-quantified data) or (4) ligand activity studies (>30% increase, minimum of two receptors). The tool automatically incorporates new structural templates, including for the classes C and F, which do not yet have supporting data in GPCRdb.
GPCRdb has previously provided thermostabilizing mutations using rule-based sequence rationale and inference of structure construct mutations, integrated in a structure construct design tool (https://gpcrdb.org/construct/design and ref. 3 ). The Katritch 26 and Vaidehi 27 groups have combined similar information (the latter also dynamics) into machine learning predictors. Furthermore, the Vaidehi group earlier combined receptor models with an energy function 28 ; however, both its approaches limit their mutation suggestions to alanine. Our new state-stabilizing mutation design tool differs by giving direct access to all pregenerated mutation suggestions for all human GPCRs and going further in providing a data-driven rationale for mutations stabilizing a given receptor state, because it is founded on templates from all GPCRs for which a structure has been determined-and new GPCR structures are added to GPCRdb monthly. Whereas established methods focus on only the desired state, this tool also removes residues stabilizing the undesired state. The importance of validating the state-specific nature of a mutation across both states is illustrated by the fact that a majority of agonist-bound crystal structures of GPCR are in the inactive state 3 . It is therefore recommended to measure the effect of a given mutation on an inactive versus active state proxy, such as the effect on binding affinity or thermal stability in the presence of an inverse agonist and agonist, respectively. Furthermore, our analysis of determinant overlap showed that the receptor state is commonly associated with not just thermal stability, but also ligand activity (binding and/or efficacy) and receptor expression at the cell membrane, which benefits from a more stable protein (Extended Data Fig. 1). Hence, state-stabilizing mutations may have an underappreciated utility across pharmacology, biophysics and structural biology, such as dissection of signaling bias determinants 29 or fine-tuning of receptor basal activity 30 , ligand assay sensitivity and signal window 31 or complexation with G proteins 3 or other effectors and receptor activity modulatory proteins.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All data is available in GPCRdb (https://review.gpcrdb.org) and GitHub (https:// github.com/protwis/gpcrdb_data). Documentation is available at https://docs. gpcrdb.org. 155 Switch 7x50 Extended Data Fig. 1 | State-stabilizing mutation design tool, residue positions and experimental data. a, The 'State-stabilizing mutation design tool' presents data-driven suggestions of mutagenesis experiments for all human GPCrs (https://review.gpcrdb.org/mutations/state_stabilizing). The tool ranks receptor positions by calculating a net sum of residue contacts expected to be gained or removed upon mutation. b, Suggested state-stabilizing positions for classes A and b1, respectively. These are limited to the 30 generic residue positions with the largest inactive/active state contact sum difference. The rightmost column indicates state stabilizers with high-frequency contacts 5 . c, Percent coverage of suggested state-stabilizing versus all other generic residue positions by experimentally determined mutations that are ligand activity-altering (>5-fold effect), thermostabilizing (540 data points) or expression increasing (100% would mean that all determinants or non-determinants, respectively are covered by experimental mutations). For ligand activity mutations (34,648 data points in GPCrdb), we required an effect in at least two receptors. For class A GPCrs, 27/30 residue positions have experimental support (avg. 1.8 functional associations). In class b1, 8 positions are supported by functional data (avg. 0.3 associations). We compared the percentages of residue positions covered by experimental effects for the class A and b1 determinants suggested in the mutation design tool (top 30) versus all other generic residue positions. This shows a near double representation of such data for suggested determinants than other generic residue positions in class b1. For class A GPCrs, we find stronger determinant overlaps spanning 2.1-, 2.9-, 3.1 and 8.8-fold ratios for mutations shown to influence thermostability, expression in ligand activity studies, ligand activity and expression of structure constructs, respectively. Notably, the top and third positions for class A GPCrs are two well-characterized residue microswitches, r3x50 and Y5x58 and the second position is a conserved proline causing the hinge of TM7. Notably, in both classes 13 out of 30 (43%) of suggested mutagenesis positions are unique from this tool (not in 5 .