PPP6R2 is a Novel Candidate Gene for Developmental Dysplasia of the Hip as Revealed by Whole Exome Sequencing of 29 Families of Danish Descent

Developmental dysplasia of the hip (DDH) is a common condition involving instability of the hip with multifactorial etiology. Early diagnosis and treatment are critical as undetected DDH is an important cause of long-term hip complications. Better diagnostics may be achieved through genetic methods, especially for patients with positive family history. Several candidate genes have been reported but the exact molecular etiology of the disease is yet unknown. Results In the present study, we performed whole exome sequencing of DDH patients from 29 families with at least two affected rst-degree relatives. We identied PPP6R2 as a novel DDH gene as two rare missense mutations were identied in three families co-segregating with disease. Mutational burden analysis across the families identied 455 candidate genes, with many genes involved in mechanotransduction, in particular the cilia, the cytoskeleton, the extracellular matrix, and the Notch pathway. Here we report for the rst time a previously uncorrelated gene with DDH, PPP6R2. Taken altogether, the data suggest a polygenic mode of inheritance for DDH, and we propose that impaired mechanotransduction is involved in the etiopathological mechanism. MAF 0.01 (gnomAD database), and used of and presented with low in database and the database of the genomic variants (DGV) ( ≤ 0.02).


Introduction
Developmental dysplasia of the hip (DDH, OMIM #142700) is a common condition characterized by abnormal hip development during infancy and early development, with a wide spectrum of phenotypic presentations [1]. Clinical diagnosis of DDH in older children presents challenges as early features are non-speci c, and the disorder might be at an advanced stage by the time of diagnosis. However, early diagnosis and treatment is critical to achieve the best possible outcome. Early treatment consists in obtaining a stable hip joint with minimal complications as early as possible, which is usually achieved with a brace that secures the hip in a exed and abducted position [2]. Undetected hip dysplasia increases the risk of long-term complications, including pain, secondary osteoarthritis and surgery [2,3].
To investigate the possibility of a shared genetic cause of DDH in our cohorts, we performed whole exome sequencing (WES) of the affected individuals. We reached a mean sequencing coverage of 140.5X. On average, 94.2% of the target regions had a sequencing depth of at least 30X (Additional le 2). There was only one fallout sample with a very low coverage, which we decided to keep as it belonged to a three-member family (family 28), and we could still use at least two related individuals for effective variant identi cation.
We performed genetic analysis of SNVs, indels, and CNVs. The diagram of the general structure of the study is shown in Fig. 1. After alignment and joint variant calling within each family, the resulting les contained on average 50243 variants (Additional le 2). We further ltered to include only coding sequence and splicing variants, with high quality signal as assessed by base call accuracy of at least 99%, a variant allele frequency (VAF) in reads of 25% or more, and by manual check of each variant. DDH is known for incomplete penetrance and we decided not to sequence healthy relatives as controls as it may have confounding effects. As control reference, we used variant frequencies in the general population as listed in the Genome Aggregation Database (gnomAD), based on data from 125,748 exomes and 15,708 whole-genomes of unrelated individuals from all over the world. We kept only the variants with a minor allele frequency (MAF) equal or under 0.01, and with less than ve reported homozygotes in the overall global population according to gnomAD.

Variant distribution and annotation
We found a total of 3332 rare and shared variants in 2798 genes, carried by related individuals across the 29 families. The great majority of the variants were missense (3025), followed by a relatively small number of stop-gain variants (75), splicing mutations (72), and frameshift variants (70) (Additional le 3). Inframe indels accounted for 66 cases, and the rest of the variants involved either 5'UTR premature start codons (9), initiator codons (4), stop losts (4), or disruptive inframe deletions (3).
On average, we found 141 rare shared variants in families with only two members. As expected, families with three members shared a smaller number of variants on average (45) than two-member families (Fig. 2), suggesting a certain amount of false positives. Family 34 had ve members and a relatively high number of rare variants in common between all the affected individuals: 14.
Surprisingly, there were two families, family 2 (two members), and family 35 (three members), which both had only two rare variants shared between the affected relatives. These were actually the same two missense mutations in the PPP6R2 gene: c.1421G>A and c.2345G>A (Table  1). The PPP6R2 gene encodes a regulatory subunit for the serine/threonine protein phosphatase-6 catalytic subunit (PPP6C). Phosphoprotein phosphatases (PPP) are multimeric holoenzymes that modulate mitotic entry and progression; usually their substrate speci city is de ned through association of the catalytic subunit with regulatory subunits [9]. We searched for additional variants affecting genes coding for regulatory and catalytic subunits of PPPs in our data and we found that family 1 also bears the same two variants in the PPP6R2 gene as family 2 and 35. In total, the two PPP6R2 SNVs were detected in seven individuals from three unrelated families. The two SNVs were always inherited together, suggesting a cis positioning on the same allele. List of genes encoding for PPP subunits carrying rare variants that co-segregated with DDH within single families. All these genes encode for regulatory subunits of PPP enzymes, among which the protein phosphatases PP1, PP2, and PP6. In the PPP6R2 gene, the same two missense variants were found in seven individuals from three unrelated families. In family 2 and family 35 these two mutations were the only rare variants shared between the affected individuals of the family. The CADD score is used to evaluate the deleteriousness of a variant. A CADD score of over 20 indicates that a variant is among the 1% most deleterious substitution in the genome, and thus is considered damaging.
Additionally, we used other six tools for evaluating a negative functional impact (SIFT, Polyphen, MutTaster, MutAssessor, FATHMM Pred, and FATHMM MKL). The results of a voting system from these six algorithms is shown. A higher number of tools predicting a variant damaging gives a higher score in the voting system. Abbreviations: PPP: phosphoprotein phosphatases; CADD: combined annotation dependent depletion.
Among other genes encoding for PPP subunits, we found a 5'UTR premature start codon variant in the ANKRD28 gene, which is another regulatory subunit of PPP6C (Table 1); two families bearing two missense PPP2R3C SNVs, and four families with four different missense variants in genes encoding PPP1C regulatory subunits: PPP1R13B, PPP1R18, PPP1R37, and PPP1R9A. Most of these variants were predicted to be somewhat damaging as their CADD score for deleteriousness was over 20.
We searched the literature for known genes associated with DDH (Additional le 4) to see if we found any known variants or known genes in our study group. We found novel variants in previously reported DDH candidate genes in humans CDH7, CX3CR1, DACH1, ESR1, MYH10, NOTCH2, PCNT, POLE, TBX4, and TENM3; while the following genes previously described in dogs had variants in our study group: COL6A3, EVC2, INPP5D, IQCA1, KLH17, LAMA2, NWD1, and OTOG.

CNV analysis
Analysis of CNVs did not detect any common structural variants in regions involving previously reported DDH candidate genes (Additional le 5). Only one CNV variant was detected in multiple families (family 1 and family 22), co-segregating with the disease, and it involved the duplication of exons 19 to 22 of the ANKRD24 gene. ANKRD24, similarly to some PPP6 regulatory subunits, has multiple ankyrin repeats domains; however, its function is currently unknown.

Functional analysis of the variants and candidate genes
To identify the most deleterious variants that may cause DDH in our cohort, we used in silico predictions to lter and select the most damaging, rare, and relevant variants. We included shared variants within families, and used a more stringent ltering for variant frequency in the total population. We selected only variants with a MAF equal or under 0.01, less than 5 homozygotes, and 590 alternate allele counts in the general population, which would be the expected number assuming an autosomal dominant mode of inheritance and an average DDH incidence of 1:3400.
Among the identi ed 3332 rare and shared variants from our study group, 825 of them were predicted to be highly damaging that either had a loss-of-function (LoF) mutation, or were assigned a very high prediction score (5 or 6 algorithms predicting deleteriousness, or a CADD score of over 25) (Additional le 6). Four of these variants occurred in known DDH candidate genes in humans: DACH1: c.2050C>T, MYH10: c.423G>A, NOTCH2: c.6094C>A, and TBX4: c.805G>A [10][11][12][13], while the EVC2: c.3061C>T, the OTOG: c.6943C>T, and the SHC3: c.875A>T gene variants occur in candidate genes previously described in dogs [14,15].
As most of the families had many very damaging variants, we used the PhoRank algorithm to lter further the variants in an attempt to select for the most relevant ones. In Fig. 3 is shown the distribution of very damaging variants across the families and according to PhoRank. The PhoRank algorithm assigns a score, from 0 to 1, to each gene according to closeness to a user-speci ed phenotype (hip dysplasia in this case). For most of the variants the median value of PhoRank was low (under 0.325). However, few variants had a very high score: 35 genes carrying 36 very damaging variants had a PhoRank score over 0.75, indicating closeness to the DDH phenotype ( Fig. 3, while Additional le 7 shows the whole list).
Among these, we found very damaging mutations in three collagen genes: COL12A1, COL2A1, and COL6A2. Mutations in COL2A1 are known to cause spondyloepiphyseal dysplasia and COL2A1 has previously been related to osteoarthritis secondary to DDH [16]. We also identi ed very damaging variants in four other genes involved in chondrodysplasias: BMPR1B, KIF22, MMP9, and NANS. Furthermore, we found three known pathogenic mutations in the genes MAN2B1, IDUA, and ABCA3, of which the IDUA: c.1829-1G>A is a LoF mutation causes mucopolysaccharidosis of type 1 [17], an autosomal recessive disorder that commonly involves the hips, with similar clinical and radiological presentation as DDH [18]. The MAN2B1 mutation is involved in mannosidosis alpha, while the ABCA3 mutation in pulmonary disease.
Importantly, most of the families presented with more than one good candidate variant that co-segregated in the affected members, leading to the assumption of a multiple genetic contribution to the phenotype. On the other hand, families with three and ve affected members (family 5, family 8, family 20, family 28, and family 34), even though they had very damaging variants, none of these had a PhoRank score of over 0.75, indicating that genes currently not connected to DDH may be involved in the pathology in these families.

Mutational burden analysis
Since the number of identi ed DDH candidate genes was high in our analysis, we decided to undertake an alternative approach by investigating which genes were most frequently affected across the families. We ltered as previously for shared variants between the affected members within the families and low variant frequency in the general population, and then we looked for genes affected by these variants in more than one family. We found 455 genes to be affected in at least two families, carrying a total of 958 variants (Table 2 and Additional le 8 showing the whole list of genes). Titin (TTN) was the most affected gene in our study group, as we found nine families and eleven missense variants in the TTN gene, with two of the families carrying two variants. Next in line was cadherin 23 (CDH23), with six families affected and carrying six different variants, among which a LoF initiator codon variant. Seven genes were affected in ve families, 16 genes among four families, and the remaining 62 and 368 genes were found carrying variants among three and two families, respectively. Figure 4 shows a list of the 87 most commonly hit genes as they were found carrying variants in three or more families.
Since our most affected gene (TTN) is also the longest in the genome, we ran a correlation test between gene length and the number of variants, and gene length and the number of families with variants for each gene. A correlation coe cient of r = 0.36 and r=0.33 respectively, suggested that gene length by itself was not a relevant factor in determining how many variants we would nd, or how many families would carry the variants.
Next, we looked in the literature to see if the list of our most affected genes contains known DDH candidate genes. We found NOTCH2, TBX4, and TENM3 known candidate genes each to have variants in two families; while DACH1, CHD7, CX3CR1, ESR1, MYH10, PCNT, and POLE candidate genes had each variants only in a single family (Additional le 4).
Although our top hit genes have not been previously reported in relation to DDH, many of them belonged to the same protein family or had a common function with many known candidate genes. For example, CDH23, an essential protein for inner-ear mechanotransduction and hearing [19], is the second most affected gene in our study and it belongs to the cadherin superfamily of calcium-dependent cell-cell adhesion proteins, as well as the DDH candidate gene protocadherin9 (PCDH9) [11]. Similarly, we found two families with missense variants in the plasma membrane Ca2+ transporting 2 (ATP2B2) gene, which is closely related to the ATP2B4 gene, whose variants were previously described to co-segregate with heparan sulfate proteoglycan 2 (HSPG2) variants in four related DDH individuals [8]. The semaphorin gene SEMA4D is a candidate gene in dogs and in humans, and SEMA5D was previously reported in dogs [20,21]; although we did not nd variants in SEMA4D or SEMA5D, we did nd a SEMA6C variant in three families.

Biological function and candidate pathways
At this point, we reasoned that the DDH phenotype might be a consequence of a dysfunctional pathway, rather than of a dysfunctional single gene. To investigate more systematically which groups of genes and pathways were more frequently affected by variants in our cohort, we performed GO term and pathway analysis taking into consideration only genes affected in two or more families.
The number one hit in the analysis of the enrichment of GO biological process terms was detection of mechanical stimulus, followed by limb morphogenesis (Fig. 5 and Additional le 9). Among other signi cantly enriched GO terms there were many belonging to the extracellular matrix, microtubules, or the cytoskeleton topology and function. GO cellular component they were: dynein and collagen complexes, the costamere, the basement membrane, collagen-containing extracellular matrix, cell-substrate junctions, myo brils, and the cilium. While enriched molecular function terms were ATP-dependent microtubule motor activity, dynein binding, microtubule and motor activity, calcium ion transporting and binding activity, and actin binding.
Signi cantly enriched keywords associated with our group of genes were Usher syndrome as we curiously found variants in four genes known to be causative of the disease including USH2A, CDH23, GPR98, and HARS. Other enriched keywords included: laminin EGF-like domain, basement membrane, dynein, motor protein, actin-binding, cilium, and extracellular matrix in con rm to what already found with GO term and pathway analysis.

Protein interaction analysis
To investigate which protein networks the identi ed genes belonged to, we used STRING (https://string-db.org/) and performed protein interaction network analysis using as input the list of 455 genes affected in at least two families. The resulting protein network was rather large, shown in Fig. 7 (Additional le 10 for high resolution), with several clearly observable smaller clusters, most of which have already been identi ed by GO term and pathway analysis such as laminins and collagens, which together form a bigger network of extracellular matrix proteins, adhesion and cytoskelal proteins, and the Notch signalling pathway. On the gure, there were also visible two clusters with members of the Wnt signalling pathway and the E3-ubiquitin protein ligase complex.
In the overall, the resulting protein interaction network had signi cantly more interconnections than expected, as the protein interaction network enrichment p value was 4.6e-14 (string-db.org). The value was calculated for 453 number of nodes (proteins) in the network, and reaching to 613 edges, as compared to 446 expected edges for a network of a random set of proteins of similar size (the average node degree was 2.71 with an average node clustering coe cient of 0.34). This means that the genes from our list form a larger group of biologically connected proteins than what a random set of genes of the same dimensions would make.

Discussion
DDH is a common musculoskeletal condition in uenced by genetic, mechanical, and environmental factors, resulting in a wide range of phenotypic manifestations. The earlier that DDH is diagnosed, the simpler and more effective are the treatments. However, late detection of instable hips is a persistent problem, and the phenotypic heterogeneity of DDH poses a serious challenge for diagnostics, especially in adults [3,22]. Predictive genetic testing may improve signi cantly diagnostic methods and patient management in families with a positive family history of DDH.
Until now, 34 genes have been related to DDH in humans [8, 10-13, 20, 21, 23-44], sometimes with contradictory results [45]. However, no tight phenotype-genotype correlations have been established and the molecular mechanisms of the disease have just started to be elucidated. To this end, canine hip dysplasia is a relevant natural animal model for DDH [46]. Studies focusing on genetic markers for hip dysplasia in dogs have unraveled over 50 candidate genes, with little overlap with human candidate genes [14,15,[47][48][49][50][51]. It is generally thought that sequence variants in individual genes are the cause of DDH, although a simple single-gene inheritance model is very unlikely to be able to explain the whole complexity of the phenotypic and genetic heterogeneity in DDH [3]. A multiple gene model of inheritance was already proposed in an initial study [7]. Modern sequencing techniques allow now for genome-wide investigations, and several studies using NGS reported many loci/genes in relation to DDH. A two-gene model identi ed by exome sequencing has been proposed for a family carrying HSPG2 and ATP2B4 variants [13]; while Zhu et al. used genome sequencing and found 1344 genes in 10 patients possibly carrying DDH-causative mutations, many of which are involved in cytoskeleton structure and function [30].
In the present study, we investigated exonic and splicing variants in family relatives affected by DDH from 29 families. We identi ed a novel gene, PPP6R2, involved in cell cycle entry and progression, to carry two missense variants co-segregating with the disease in three families. Since the two mutations are inherited together, they are most probably situated in cis, suggesting a founder allele model.
Besides two families carrying only PPP6R2 variants, in most of the other families we identi ed several genes that may be imputable for the disease, suggesting a polygenic mode of inheritance.
In an attempt to select the genes that are more relevant to DDH, we looked for genes carrying variants in multiple families. This approach identi ed two novel genes, TTN and CDH23, as the two most affected genes in our study, presenting with variants in more families than any other gene. Although neither of these genes have been were previously related to DDH, Zhu et al. [30] lists TTN among the DDH-speci c genes in their study group, while CDH23 was associated to DDH in a GWAS study from Yan et al. [52]. Among the DDH-closest genes (according to PhoRank) with the most damaging variants in our study, we found that EVC2, FRAS1, CDC45, and NOTCH2 have variants co-segregating with the disease in two families each, while ABCA3, EP300, TBX4, and IDUA had variants in three families, further strengthening the notion that these genes may be involved in the etiopathogenesis of DDH.
If we considered group of similar genes, the number of families with variants would increase considerably. For example, NOTCH2, which encodes a transmembrane surface receptor, together with other three Notch receptors plays a fundamental role in bone morphogenesis [53].
Besides the two families carrying variants in known DDH candidate gene NOTCH2, we also found three families carrying variants in NOTCH4, and one family with a variant in NOTCH3. Similarly, we found two missense variants in the known DDH gene TBX4, but we also found two families with TBX5 variants. The myosin heavy chain 10 (MYH10) gene is another previously reported DDH gene [10]; we found one family with a very damaging variant in MYH10, but also three families in MYH7B, two families in MYH14, and one family in MYH3. Laminin 2 (LAMA2) is a candidate extracellular matrix gene described in hip dysplasia in dogs [48]. Besides nding two families with LAMA2 variants, we also found four families with LAMA1 variants, two families with LAMA5 variants, and one family with a LAMA3 variant.
We applied pathway analysis to provide insights into which groups of genes are the most commonly affected. As expected, we found the Notch transcription pathway, extracellular matrix-receptor interactions, and laminins to be among the signi cantly enriched pathways. Notch is an ancient pathway that regulates critical processes during embryological development and it is involved in skeletal development and homeostasis [53]. It is the third most important pathway for osteoblastogenesis, next to the Wnt signalling pathway and the bone morphogenetic protein (BMP) pathway, and these pathways interact between each other in osteoblast differentiation [54]. The Notch pathway is also involved in mechanotransduction [55,56], as mechanical force is required for ligand-induced Notch activation [57,58].
Many enriched GO-terms we found were related to the cytoskeleton and microtubules, similarly to what previously described by Zhu et al. [30].
More speci cally, we identi ed the following GO molecular function terms as dominant: ATP-dependent microtubule motor activity, dyneins, actin-binding, and calcium binding. Collagens, the extracellular matrix, the basement membrane, the costamere, the axoneme, adherens juctions, the cilium, myo brils, and cytoskeleton were most frequent topological GO term associations. Interestingly, the most enriched biological process GO term was detection of mechanical stimulus.
Mechanical forces in uence the growth and shape of virtually every tissue and organ in our bodies, and through mechanotransduction these forces are translated into molecular processes and gene expression. Integrins and cadherins mediate the adhesion of cells to the extracellular matrix and their distortion activates key molecules on the cell membrane, such as ion channels and other signal-transduction molecules, which trigger signalling cascades leading eventually to changes in gene expression. An alternative model suggests also that mechanical stress can be conveyed directly through the cytoskeleton to a nucleic scaffold of actin, myosin, and nuclear titin, which detects the nuclear envelope stress and activates genes within seconds [59,60]. . Many ciliopathies involve diverse skeletal malformation, suggesting that cilia are indispensable for regulating bone development and homeostasis. The ciliary membrane hosts many receptors for detecting various stimuli including Notch receptors. Cilia are an enriched GO term in our study as we nd 26 genes in this category to have variants cosegregating with disease in more than two families including the previously reported candidate gene EVC2. Furthermore, variants in ciliary protein genes RP1L1 and PKD1 were found in ve different families co-segregating with disease, as well as variants in four genes causing Usher syndrome involved in proper organization of stereocilia (CDH23, GPR98, HARS, USH2A), among which is our second most frequently affected gene CDH23.
Curiously, PPP6, whose regulatory subunit PPP6R2 gene we identify as the novel DDH candidate gene, is a speci c regulator of the G0 mitotic kinase Aurora A (AURKA) [63]. PPP6 inactivates AURKA through de-phosphorylation during spindle formation. However, an emerging role of AURKA is the regulation of ciliary assembly [64]. AURKA is abundant in G0 where it triggers ciliary resorption and AURKA removal is a likely prerequisite of effective ciliogenesis. An inappropriately active AURKA could impair the proper resorption of the cilia prior to mitosis and possibly disrupt the cell cycle.

Conclusion
We identify PPP6R2 as novel DDH candidate gene, and we nd very damaging variants in the following known candidate genes from human and canine hip dysplasia: DACH1, MYH10, NOTCH2,TBX4, EVC2, OTOG, and SHC3, which has important implications for the development of diagnostic methods for DDH. Our study also supports the hypothesis that multiple genes contribute to the development of DDH, as we found many families with several high damaging variants co-segregating with the disease. Pathway analysis of more frequently affected genes in our study group identi ed transduction of mechanical stimulus as the primary enriched biological process, and we found a remarkable number of variants in genes involved in cilia. We also found variants in a signi cant number of genes belonging to the Notch pathway, as well as the extracellular matrix proteins, microtubules, cytoskeleton, collagens, and laminins. Most of these proteins participate to the transduction of the mechanical stimulus at some level. This provides a link that could explain in molecular terms how environmental and mechanical stimuli, e.g. tight swaddling or a harness, determine the insurgence and/or the treatment of DDH. We conclude that DDH is a polygenic disorder, and we propose that impaired mechanotransduction is involved in the pathophysiological mechanism of the disease. Further studies are needed to con rm and de ne more in detail the biological signi cance of these variants respect to disease.

Methods
General overview of the project We carried out genetic analysis of 66 affected individuals of Danish origin, from a total of 29 families, with con rmed DDH diagnosis through radiological and clinical examination, ascertained through the Danish periacetabular osteotomy (PAO) Database at Aarhus University Hospital [65]. We focused only on affected individuals within the families, as DDH is known for incomplete penetrance and including healthy relatives as controls may have confounding effects. As control we used MAFs of each variant in the total population according to the gnomAD database v2.1.1 (GRCh37/hg19), previously known as the Exome Aggregation Consortium (ExAC).

Whole exome sequencing
After obtaining written informed consent, we collected blood EDTA samples and performed whole exome sequencing (WES). Genomic DNA was isolated using standard methods. The libraries were prepared on a Beckman Coulter Biomek 4000 Laboratory Automation Workstation

Functional analysis of variants
To select for the most damaging and DDH-related variants we further ltered the variants keeping only those with less than 590 alternate allele counts (the expected allele count assuming an autosomal dominant mode of inheritance and given the average DDH incidence of 1:3400 [3]) in gnomAD. We used the Combined Annotation Dependent Depletion (CADD Scores 1.4) method to score the variants for deleteriousness, PHRED scaled scores were used [67]. The CADD algorithm includes conservation metrics, functional genomic data, exon-intron boundaries, and protein functionality scores. A score equal or greater of 20 indicates that a variant is predicted to be among the 1% of the most deleterious substitutions. We used a CADD score of over 25 to select for the most damaging variants in our study group. We further used a dbNSFP tool [68] that incorporates six different algorithms to evaluate the functional impact of a variant, including: SIFT, Polyphen, MutTaster, MutAssessor, FATHMM Pred, and FATHMM MKL. The most damaging variants were selected based on a voting system where ve or all six of these algorithms predicted a negative functional impact. To estimate which variants were more likely to be involved in DDH we used the PhoRank algorithm and hip dysplasia as user-speci ed phenotype. PhoRank assigns a score to each gene based on their proximity to a userspeci ed phenotype, using Gene Ontology (GO) and Human Phenotype Ontology (HPO) [69]. The genes are ranked from 0 to 1 where 1 indicates the closest direct, or through a shared gene, relationship.
Genetic overlap, GO term, pathway, and STRING analysis We created an R-script to make a list of all the genes and variants we found in the families and to lter for those that occur in multiple families. They were calculated using the Benjamini-Hochberg procedure for multiple testing correction within each category. The protein interaction network was analyzed and visualized using STRING. STRING is a database of physical and functional, known and predicted, protein-protein interactions [70]. The STRING online resource performs network and enrichment analysis, visualizes protein networks, and reports a statistical analysis of enrichments of connections between proteins, and functional subsystems.

Declarations
Ethics approval and consent to participate The study was performed in line with the principles of the Declaration of Helsinki, and it has been approved by The Regional Committee on Health Research Ethics for Southern Denmark (S-20170055). Written informed consent was obtained from each patient included in the study prior to sample collection.

Consent for publication
Not applicable.

Data availability statement
In order to comply with the ethical approval, anonymized data supporting the ndings of this study is available within the main article and its supplementary data, containing all the rare and shared variants within the families. Raw sequencing data are not available for broad data sharing because of the ethical and legal restrictions on data usage, since the patients did not provide informed consent that covers deposition to public repositories, sharing, and additional use of their genetic data. Please contact the corresponding author (Maja Dembic, Maja.Dembic@rsyd.dk), if you would like to request the raw data; it will however require an additional application to the ethical committee. All software used is described and cited throughout the manuscript. Custom scripts are available at https://github.com/MaDemb/DDH.

Competing interests
The authors declare no con ict of interest.

Funding
Funding support for this work has been received from 'The Danish Rheumatism Association' (grant number R101-A1997) and the "Bevica Fonden".

Authors' contributions
MD performed data analysis, genetic data interpretation, and wrote custom scripts and the manuscript. LvBA performed data analysis, and wrote customs scripts. MJL provided data analysis support, and participated in the discussions on the overall and detailed study plan. IM and KS performed clinical examinations and did the selection of patients; JH participated in the patient collection, designed the research project and supervised it. All authors discussed the results and commented on the manuscript.

Figure 1
General overview of the project We performed whole exome sequencing of 66 patients from 29 families. After alignment and variant calling, the variants were ltered according to frequency and quality, and were manually checked for artefacts. We undertook a single family approach performing genetic analysis of the variants within each of the family, and a transversal approach across families identifying genes occurring with variants in more than one family. Abbreviations. SNVs: single nucleotide variants; InDels: inserts and deletions; CNVs: copy number variations; MAF: minor allele frequency; UTR: untranslated region; VAF: variant allele frequency (in reads). GO: Gene Ontology.

Figure 2
Distribution of the variants across the families In the gure is shown a histogram representing the distribution of 3332 rare variants across the families. These included all splicing and coding sequence variants, with a MAF ≤ 0.01, and shared between the affected individuals within a family. Most of the families (dark blue) had only two members, ve families had three members (lighter blue), while family 34 had ve members. Family 2 and 35 had only two variants, discovered to be the same ones.

Figure 3
DDH relevant genes with the most damaging mutations In the gure is shown a box-plot of the 825 most damaging variants and their distribution across the families. The most damaging variants found in the affected individuals from 29 families were selected based on a high prediction of deleteriousness. The variants were either loss-of-function (LoF), had a CADD score of over 25, or were predicted to be damaging by 5 or more in silico tools. The box-plot shows the distribution of PhoRank values of the variants across the families. A PhoRank of over 0.75 predicts a related function of the gene respect to hip dysplasia phenotype using Gene Ontology (GO) and Human Phenotype Ontology (HPO). The median line shows the median, the hinges are the rst and third quartiles, and the whiskers extend to the largest and smallest value, but not further than 1.5 * IQR ( rst and third interquartile range) from the hinge. The names of the genes carrying very damaging variants and a PhoRank score of over 0.75 are visible over the red line. The most frequently affected genes in our study Histogram with the 87 genes bearing rare variants co-segregating with DDH in multiple families and the respective family count. TTN is the most affected gene occurring with variants in nine families. For simplicity, only the genes having variants in three or more families are shown.

Figure 5
GO term analysis of the most affected genes in our study In the gure is shown an enrichment analysis made with STRING (string-db.org) of the GO terms within biological process (a), cellular component (b), and molecular function (c) of the 455 genes carrying variants in two or more families. The strength describes the enrichment effect and it is measured as Log10 (observed / expected). The ratio is between the number of observed genes from our network in a speci c category, and the expected number of genes falling in the same category from a random network of the same size. False discovery rates (FDR) are p-values corrected for multiple testing using the Benjamini-Hochberg procedure. Enrichments with an FDR of under 0.05 are considered signi cant and displayed in the gure.  Protein interaction network of the most frequently affected genes In the gure is shown a protein interaction network calculated using the STRING software. The spherical nodes in the network are single proteins; loose nodes with no connection to other proteins are not visualized in this representation. The lines represent functional and physical protein associations, while the thickness of the lines represents the con dence level, with the thickest representing maximum con dence. Some of the clusters visible as an aggregation of high con dence connections between nodes are highlighted in red. ECM: extracellular matrix.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.