Phenotypic characterization of the study cohort. In total, 5652 individuals with a suspected rare disorder were enrolled in TRANSLATE-NAMSE by CRDs at ten German university hospitals over a period of three years (2018–2020). MDTs including medical genetic and domain-specific clinical expertise evaluated information from the patients’ health records and family histories and then made recommendations on how to proceed for each individual.
Here, we report on the patients (268 adults and 1309 children, total: 1577) who underwent ES following the recommendation of the MDT. Clinical features of these patients were encoded using the Human Phenotype Ontology (HPO) terminology, resulting in an average of five HPO terms per patient (Fig. 1a, Supplemental Fig. 1)9. In addition, a subset of 211 individuals consented to the analysis of their portrait photos by an artificial intelligence tool (PEDIA subset).
On the basis of the leading presenting symptom, each case was assigned to one of five major disease groups (Fig. 1b). The majority of children presented with neurodevelopmental disorders (51%), and the majority of adults with neurological or neuromuscular disorders (41%). Smaller proportions of cases presented with organ malformation, endocrine/metabolic, and immune/hematologic disorders. This is comparable to other large cohorts of undiagnosed patients10. However, the challenge of these assignments to disease groups can be illustrated by a comparison of the annotated phenotypic features: the correlation of higher-order HPO terms between the disease groups is considerable, indicating a high phenotypic overlap (Fig. 1c, Supplemental Material). This can also be visualized in the clinical feature space in which individuals are positioned according to their original HPO annotations: while most patients of the same disease group are close together, their clusters partially overlap (Fig. 1c). For instance, many patients with neuromuscular or neurodevelopmental disorders are phenotypically often so similar that an assignment to a disease group seems rather arbitrary. We therefore analyzed the diagnostic yield not only per disease group, but also based on phenotypic features.
Diagnostic yield and molecular findings. All lab results including rare, potentially causative variants were analyzed and discussed by the MDTs in context of the presenting phenotypes and facial dysmorphic features. All such variants were classified according to standard guidelines11,12, and their allelic contribution to disease was assessed.
In total, a molecular diagnosis could be established in 494 patients (36%) because pathogenic or likely pathogenic variants were found that explained the phenotype fully or partially. The diagnostic yield was 5 percentage points higher for children (32%) than for adults (27%), most probably indicating the higher likelihood of a monogenic disorder if the age of onset is early in life (Fig. 2a).
Patients with a neurodevelopmental disorder were more than twice as likely to receive a molecular diagnosis than patients with disorders of, for example, the endocrine system (Fig. 2a). However, since assignment to disease groups can be ambiguous, we also analyzed the influence of all phenotypic features on the diagnostic yield in a multivariate regression analysis.
Least absolute shrinkage and selection operator (LASSO) analysis yielded “dysfunction of higher cognitive abilities”, “hematological abnormalities”, and “ataxia” as very influential parameters for identifying a disease gene (Fig. 2b). Although our model was trained on the TRANSLATE-NAMSE cohort, it was also validated on an independent cohort with comparable results. We, therefore, made the model available as a web service that can be used to estimate the diagnostic yield of genetic testing given the phenotypic features of a patient (http://tnamse.de/)13.
The diagnostic yield in patients that agreed to an evaluation of their molecular and clinical data, including analysis of portrait images by artificial intelligence, was 42%. Although this is substantially higher than for the remaining TRANSLATE-NAMSE cohort, this is most probably explained by an ascertainment bias, since patients with facial dysmorphism were more likely to participate in the PEDIA subset14. However, of note is the high sensitivity of the fully automated prioritization pipeline that lists the correct disease gene among the top ten suggestions in four out of five cases7 (Fig. 2c). The support of the PEDIA workflow could further be increased by including gestalt scores of a recent algorithmic update that focuses on ultra-rare phenotypes8. The AI support not only speeds up data analysis but also yields additional evidence for variant classification, particularly if the gestalt is quantified as highly similar in a phenotype of high distinctiveness11,15.
For 18 cases that were classified as uncertain or unsolved after initial ES, functional assays such as analysis of the methylome (n = 4), transcriptome (n = 11), or proteome (n = 3), were conducted for further classification. Proteome analyses were particularly informative in three cases where this strategy was used, highlighting the importance of variant validation strategies in diagnostics (Supplementary Case Reports)16–18. Epigenetic signatures could clarify the status of de novo missense variants as likely to be benign or pathogenic, as exemplified by a case with a missense variant in KMT2D (Supplementary Case Reports)19,20
Mode of inheritance and recessive disease burden. In accordance with previous reports on comparable cohorts, 214 (44%) of the solved cases were due to de novo variants (Fig. 3a). In three families, establishing the diagnosis was particularly challenging due to mosaicism (Supplemental Material). In one of these families, the same pathogenic variant in PUF60 was identified as the cause of developmental delay in two affected brothers. Because the variant was not detectable in the exome data of either parent, the presence of the presumable gonadal mosaicism could only be suggested to the special family history. Previously reported proportions of parental mosaicism below 1% should therefore only be regarded as a lower bound21–23.
The second-largest fraction of diagnoses was due to an autosomal recessive (AR) mode of inheritance, with autozygosity being an important covariate. We computed the homozygosity and used a threshold of 2% to assign patients to a group of high (n = 126) or low (n = 262) autozygosity24. Although there was no significant difference in the diagnostic yield between the groups (low autozygosity 29% vs. high autozygosity 28%), the composition of the modes of inheritance differed considerably (Fig. 3b); the relative contribution of homozygous variants was considerably higher in the high autozygosity group (44%) than in the low autozygosity group (3%) (OR 15.9). In contrast, the proportion of de novo variants contributing to disease was 50% in the low autozygosity group compared with 22% in the high autozygosity group (OR 2.3).
Because the de novo mutation rate depends on parental age but not on autozygosity, the disease prevalence attributable to such variants should be comparable in both groups and can be used for normalization (Fig. 3c). For an inbreeding coefficient above 2%, this suggests a recessive disease burden seven times that for those with lower inbreeding coefficients, which is in agreement with previous reports 24–26.
In eight individuals, representing roughly 2% of all solved cases, we reported two molecular diagnoses of distinct or overlapping disease phenotypes in a single family, which is in agreement with earlier reports27. This group was also enriched for high autozygosity and recessive disorders, which also concurs with earlier reports (Supplemental Table dual diagnosis)28.
In addition to the relatedness of two healthy parents, a more accurate estimate for the disease risk in offspring needs to incorporate the number of heterozygous pathogenic variants in recessive genes, which can vary considerably depending on demographics29–32. We found that 89 of the 116 variants that we reported in recessive disease genes, would also have been classified as pathogenic if they were identified in healthy individuals33. That also means that ES as an expanded carrier screen would have pointed to an elevated risk in 77% of the couples in our cohort that had an offspring affected by a recessive disease34.
Novel diagnostic-grade genes and candidates. For all 494 individuals with a molecular diagnosis, we reported in total 546 distinct pathogenic or likely pathogenic variants in 364 different diagnostic-grade genes (DGGs) (Supplemental Material). We estimated the incidences of the associated disorders and tracked the years in which they were first described. As a proxy for incidence, we ordered all known DGGs according to the number of case submissions in ClinVar and plotted the number of variants in the TRANSLATE-NAMSE cohort corresponding to these genes (Fig. 4a). The first quartile contains clinical reports of 24 patients in 10 different DGGs, whereas the last quartile features 113 DGGs, with most diagnosis in the latter group corresponds only to a single patient. In comparison to other cohorts of comparable size, this distribution is shifted to the right, suggesting a significant enrichment for ultra-rare disorders in the TRANSLATE-NAMSE cohort1,5,35,36 (Supplemental Material). Almost half of the diagnoses that we established have only become possible in the last decade (Fig. 4b), demonstrating the huge progress in medical genetics due to high-throughput sequencing and emphasizing the importance of data reanalysis37,38. Cases in which no diagnosis could be established in a known DGG were included in national and international studies for the discovery of novel disease etiologies via the MatchMaker Exchange (MME) Network39,40. Variants with a high likelihood of being disease-causing, for example those with loss of function or high pathogenicity scores or that arose de novo, were shared through MME to identify similar patients41,42. In 64 cases, we identified indications for novel disease relationships in 55 genes, most of them related to a neurodevelopmental phenotype. Of this set, 23 candidates achieved medium evidence and 32 high evidence and are currently under further investigation. Fifteen genes have subsequently reached DGG status. We briefly describe a few exemplary cases below, most of which have already been published in detail elsewhere43–52, and provide a comprehensive list of all patients in the Supplemental Material. In SMARCA5, a gene coding for a chromatin remodeler, we identified de novo variants in two patients with neurodevelopmental delay and similar dysmorphic features. The phenotype-gene association was strengthened by the identification of ten additional cases from other cohorts, and rescue experiments with wildtype transcripts in Drosophila suggested a hypomorphic effect of the variants47. In another individual with learning disabilities, autism, dystonia, and intention tremor, we identified a de novo missense variant in KCNN2, a small-conductance calcium-activated potassium channel. Interestingly, a preexisting rat model with missense substitution identical to that found in the affected individual partially mirrors this phenotype with abnormal locomotor activity and tremor. The identification of nine additional individuals from other cohorts and results of functional analysis of the variants on channel function established KCNN2 as a dominant disease gene for a neurodevelopment movement disorder50. A recognizable syndrome with multiorgan manifestations could be delineated for biallelic truncating variants in MAPKAPK5 (MAPK-activated protein kinase 5). Patients presented with severe developmental delay, variable brain anomalies, congenital heart defects, facial dysmorphism, and a distinct type of synpolydactyly with an additional hypoplastic digit between the fourth and fifth digits of hands and/or feet43. Aside from novel neurodevelopmental disorders, OAS1, which encodes a type I interferon–induced, intracellular double-stranded RNA (dsRNA) sensor that is required for antiviral defense, was established as a DGG. We identified a de novo gain-of-function variant that caused dsRNA-independent activity of OAS1 in a patient with immunodeficiency, pulmonary alveolar proteinosis, and phospholipid accumulation. The hyperinflammatory disorder was cured by allogeneic hematopoietic cell transplantation45. A homozygous frameshift variant in CYHR1, encoding a protein with to-date-unknown function but high cerebral and cerebellar expression, was identified in a girl from non-consanguineous parents who presented with global developmental delay and hyperreflexia. Interestingly, the variant was inherited from the unaffected father and was homozygous in the patient due to paternal uniparental isodisomy of chromosome 8 as confirmed by exome-wide SNP analysis. RNA sequencing revealed significantly lower CYHR1 gene expression, highlighting CYHR1 as an excellent candidate gene for AR non-syndromic intellectual disability.
In comparison with pathogenic variants in previously known DGGs, there was a higher proportion of missense variants in our candidate gene set, most likely because classification is more challenging (Fig. 4c). In addition to standard pathogenicity scoring approaches, we therefore also made use of the structural predictions that became recently available from AlphaFold 2 and found high conservation of amino acids that were closer than six Angstroms to the mutated site, potentially indicating a higher likelihood of pathogenicity (Supplemental Material)53.
Diagnoses with causal therapeutic implications. For five patients in the TRANSLATE-NAMSE cohort with a molecular diagnosis personalized treatments, or therapies directed against the mechanism of the disease could be initiated.54 A patient with metachromatic leukodystrophy (MLD) due to pathogenic variants in arylsulfatase alpha (ARSA) was treated with autologous CD34 + cells that were transduced ex vivo using a lentiviral vector encoding ARSA55. The gene therapeutic approach with atidarsagene autotemcel has been authorized by EMA in the EU since 17 December 2020. A patient with pyruvate dehydrogenase E1-alpha deficiency due to a de novo variant in PDHA1 and another patient with GLUT1-deficiency due to pathogenic variants in SLC2A1 were treated with a ketogenic diet. In a patient with cerebral creatine deficiency syndrome 1, due to a misssense substitution in SLC6A8, supplementation with creatine was started. In a patient with congenital disorder of glycosylation of type IIc, due to a homozygous missense variant in SLC35C1 the fucosylation deficiency was treated by oral fucose supplementation56.