Methodology for Generating Datasets with Characteristic Diagnostic Parameters of Rare Diseases Using the Example of Pompe Disease, Gaucher Disease and Smith-Lemli-Opitz Syndrome

Finding a diagnosis for rare diseases is a challenge for patients and those treating them. Establishing a uniform methodology for specifying the symptoms of a patient seems useful. This, as well as a database with clinical parameters reported in patients already diagnosed with the corresponding disease or that have led to the diagnosis, would facilitate the global data exchange between specialists and subsequently diagnosis. The aim of this work is to develop standardized data sets with the most frequent symptoms exemplarily for the three rare diseases late-onset Pompe disease, Gaucher disease Type I and Smith-Lemli-Opitz syndrome (SLOS). ≤ 40%. The highest frequency was still shown by the structural brain anomalies in 6/24 patients.

are intended to improve prevention and diagnosis and to guarantee high-quality healthcare for patients with rare diseases throughout Europe. The Clinical Patient Management System (CPMS) designed for the ERNs is intended to facilitate cross border diagnostic and therapeutic consultations and the safe exchange of patient data through the cooperation of specialists across European borders in compliance with data protection regulations.
In order to facilitate this exchange, it seems useful to establish a uniform methodology for specifying the symptoms of a patient. Furthermore, a type of database with clinical parameters that are reported in patients already diagnosed with the corresponding disease or that have led to the diagnosis would facilitate the exchange of data and the subsequent diagnosis.
Within the scope of this work, data sets with the most frequent symptoms and diagnostic criteria for three exemplarily selected rare diseases are to be developed on the basis of a literature analysis as well as the examination of patients of two reference centers. For these data sets, a standardized word form is to be chosen that enables European or even worldwide exchange and can also be used for the CPMS.
The three genetically determined rare metabolic diseases selected as examples are late-onset Pompe disease, Gaucher disease Type I and Smith-Lemli-Opitz syndrome (SLOS).
Late-onset Pompe disease, a lysosomal storage disease, results from a genetic de ciency of the enzyme acid alpha-1,4-glucosidase, which hydrolyzes glycogen to glucose. The resulting accumulation of glycogen in the lysosomes due to the enzyme defect leads to damage of glycogen-storing organs, such as muscles. The prognosis can be improved by enzyme replacement therapy, also depending on the severity of the enzyme de ciency.
Gaucher disease Type I, another lysosomal metabolic disorder, is caused by a de ciency of the lysosomal enzyme beta-glucocerebrosidase, due to a defect in the GBA gene. Glucocerebroside, an unsplit metabolic product of glycosphingolipids, accumulates and forms Gaucher cells. In Type I patients hepatosplenomegaly, bone involvement and hematologic system changes may be present. Type I can be treated by enzyme replacement therapy.
SLOS is caused by a genetic defect in cholesterol biosynthesis, so that 7-dehydrocholesterol cannot be converted into cholesterol due to a lack of 7-dehydrocholesterol reductase. The disease is characterized by numerous craniofacial dysmorphic signs, 2-3 toe syndactyly, psychomotor retardation and organ malformations. Symptomatic treatment with cholesterol supplements and/or the HMG-CoA-reductase inhibitor simvastatin is performed.
For all three diseases, there are already published studies of the symptomatology and diagnostic parameters of differently sized patient cohorts. However, the literature research carried out shows how di cult it is to bring symptom descriptions from different studies to a common denominator, to summarize them or even to derive diagnoses for patients who have not yet been diagnosed, due to the different focal points and vocabulary used.
Compressed and standardized data sets should contribute to facilitating the path to diagnosis and the exchange with specialists for those treating and affected by the disease.

Methods
In cooperation with the University Children's Hospital Magdeburg and the Center of excellence for Rare Metabolic Diseases at the Charité Berlin, three hereditary metabolic diseases were selected for the development of the data sets: Late-onset Pompe disease, Gaucher disease Type I and SLOS. For these diseases, both centers have their own experience and larger patient collectives.
First, a systematic literature analysis was performed with regard to characteristic symptoms and diagnostic criteria of the three diseases mentioned. For this purpose, the book Vademecum metabolicum [1], specialist portals such as Orphanet and OMIM (Online Mendelian Inheritance in Men) as well as numerous other publications were used. The  result was a table with 67 diagnostic parameters for Pompe disease. In addition to a literature analysis for late-onset   Pompe disease, the parameters for the infantile-onset form of the disease were also researched. For Gaucher's  disease, a table of 63 symptoms and diagnostic criteria was obtained, again differentiating between Gaucher types I,  II and III. The literature search for SLOS syndrome yielded 84 symptoms and diagnostic criteria. If a classi cation of the frequency of occurrence of the diagnostic parameter was found in the literature, it was adopted in the table created. There is a subdivision of frequency into very frequent, frequent, occasional and rare, whereby mainly very frequent, frequent and partly occasional occurring features were adopted in the table. However, since these frequency data are not available for all parameters, a complete ordering of the parameters according to their frequency as reported in the literature is not possible.
Subsequently, these terms, formulated freely or based on the sources, were converted into vocabulary standardized by The Human Phenotype Ontology (HPO), so-called HPO terms. HPO is an online database of standardized vocabulary of phenotypic abnormalities of human diseases [2]. This standardization facilitates the global exchange of disease data [2].
In the third step, medical records of patients with con rmed diagnosis of the respective diseases were analyzed according to these criteria. A retrospective analysis of the medical records of 23 patients with late-onset Pompe disease, 21 patients with Gaucher disease Type I and 25 patients with SLOS was performed. In the tables created for the literature analysis, it was marked which of the parameters occur in which patient. A subdivision was made into characteristic present, characteristic not present or no speci cation. In this context, the table was expanded to include characteristics that were not included in the table during the literature research but did occur in patients.
This resulted in percentage frequencies with corresponding con dence intervals for the occurrence of a characteristic within the patient group. There were two variants of the calculation: in the rst variant, the percentages were calculated for the entire patient cohort of a disease for each characteristic, including the patients with missing information. In the second variant, the patients without information for this characteristic were excluded. Accordingly, the percentage frequencies of occurrence here are partly related to smaller numbers of patients. With regard to the latter variant, characteristics were ltered out that were detected and documented in ≥ 40% of the patients assessed according to this characteristic and collected simultaneously in a certain minimum number of patients.

Results
The evaluation of the frequency of occurrence of the diagnostic parameters in the examined patients following the literature analysis was carried out separately for each of the three diseases. The results of the examinations of the patient cohorts for the individual diseases are presented below.
Pompe disease, late onset form A total of 23 patients were included in the analysis, 10 female and 13 male. All patients included in the analysis were diagnosed with late-onset Pompe disease (LOPD), either due to an abnormal sequence analysis of the GGA gene (documented in 21/23 patients) and/or the measurement of reduced activity of lysosomal alpha-1,4-glucosidase (in 15/23 patients). Age at diagnosis ranged from 12 to 73 years, with a mean of 39.5 years. 7/23 (30%) patients had a positive family history of LOPD. Table 1 presents the 20 studied characteristics that were present in ≥ 8 patients and reached percent frequencies ≥ 40%. Twelve of these features had frequencies of occurrence ≥ 60%. The patient records of 21 patients diagnosed with Gaucher disease Type I were analyzed, 11 female and 10 male. The age at diagnosis ranged from 7 to 81 years, with a mean of 34.2 years. A positive family history of Gaucher disease was known in 3 patients. Sixteen features occurred with frequencies ≥ 40% and were simultaneously present in ≥ 7 patients ( Table 2). Ten of these features had percent frequencies of at least 60%. Smith-Lemli-Opitz Syndrome Twenty-ve cases of patients with SLOS were analyzed. Of these, 9 patients were female and 16 patients were male. The diagnosis was made on the basis of an elevated 7-dehydrocholesterol (7DHC) level in plasma or, in one case, amniotic uid, or on the basis of genetic analysis. The mean age at diagnosis was 12.8 months of life with a range from the 9th month of gestation (equivalent in calculation to 0 months of life) to 74 months of life. Nine of the examined patients had a known positive family history. Table 3 shows that 17 features occur with a frequency of at least 40% and were simultaneously present in ≥ 9 patients. An exception is the feature Cryptorchidism, which, because it can only be applied to the 16 male patients, must only be present in at least 6 patients. Eight of the 17 characteristics occur in at least 60% of the cases.

Discussion
The standardized English-language vocabulary developed by the Human Phenotype Ontology includes more than 13,000 terms to describe signs, symptoms, or phenotypic manifestations that characterize speci c diseases. Despite this comprehensive registry, converting the terms used by clinicians for documentation into HPO terms proved challenging in some cases. Different vocabulary was used for the same symptoms, or there were minimal differences or variations in the expression or presence of symptoms, making it di cult to assign them to a speci c term. One possibility here is the categorization and hierarchical ordering of HPO terms from a generalized abnormality with various subgroups to increasingly detailed characterizations. Thus, in some cases, assignment to a somewhat more general HPO term was necessary. Furthermore, since HPO provides only phenotypic characteristics, this also explains why some very common traits were not standardized by an HPO term. In the future, due to the continuous expansion and updating of the HPO database, a more precise characterization of the symptomatology will be possible. However, this shows that the use of a standardized vocabulary is advantageous for the documentation and exchange of the symptomatology of certain diseases.
Another methodological challenge was the retrospective nature of the study. It is true that both reference centers involved care for their patients according to a detailed care scheme speci c to each disease, so that certain diagnoses and examinations are always carried out. However, there are patients who came from other centers or from other countries, so that in some cases only the diagnosis and no or incomplete ndings at diagnosis were available. For this reason, and also due to different types of documentation, no information was available for some parameters that were established in the literature analysis. In some cases, it was also not possible to determine whether the characteristic had actually not been investigated or whether it had been investigated but was not present and therefore not documented. Accordingly, the evaluation of some characteristics was only possible on the basis of a smaller group of patients than the original number, since missing data had to be subtracted.
If, as planned, the data sets are expanded by other physicians or within the framework of studies with further patient examples, this will increase their informative value. At the same time, biases that arise due to differences in the focus of different physicians in the study will be compensated for. It should also be considered to what extent an additional temporal classi cation of the occurrence of the characteristics can be documented. It would then be possible to distinguish between initial symptoms and late symptoms. This precision of the information could further facilitate the diagnostic pathway. The retrospective study carried out here proved to be unsuitable for this purpose because the documentation did not always show when and which symptoms appeared for the rst time. Possibly, interviews with the patients in addition to the analysis of the les could provide more concrete information.
Furthermore, a comparative analysis of the results obtained with the existing literature is necessary. This will be done in this rather methodical work using the SLOS as an example.

Smith-Lemli-Opitz syndrome
Typical features of SLOS that also contribute to the diagnosis include craniofacial malformations, psychomotor retardation, malformations of organs and extremities, failure to thrive and developmental disorders. Both in the literature and in the current study, these features are described, but with different emphases and percentages. Table 4 presents the results of the analysis of SLOS patients and 6 other studies in a comparative manner. Other typical features, on the other hand, were hardly or not at all documented in the patients studied here. These include, for example, the occurrence of a Cataract in only one case (1/24; 4.2%) compared to other studies in which percentage frequencies of 28.5% [7] or 12% [6] noted. Bitemporal narrowing is reported in the cohorts of Donoghue et al. [7] and Quélin et al. [5] with frequencies of 14.3% and 50% [5]In contrast, Biparietal narrowing is documented in only one patient in the cohort studied.
These very different frequencies could be explained on the one hand by the fact that an assessment of the facies was probably rarely carried out using a standardized list of features. Often, the features that catch the examiner's eye are documented, but no explicit search is made for speci c dysmorphic features. On the other hand, however, there is a very broad spectrum of possible craniofacial abnormalities that vary greatly in severity. This is also re ected in the distinction made by some between Type I with milder manifestations and Type II with more severe manifestations. With regard to the creation of data sets to facilitate diagnosis, this observation argues in favor of concentrating on the frequently occurring features.
In addition, genital anomalies are repeatedly described. In the current study, the feature Cryptorchidism is one of the most frequent features, being present in 50% of the patient cohort (in 8/16 male patients). Other parameters from the genital system do not occur in more than 40% of the patients studied: Ambiguous genitalia (8.3%), Hypoplasia of penis (6.25%), Hypospadias (37.5%). Ryan  Hypospadias (3/6; 50%), Micropenis (2/6; 33.3%) and Undescended testes (4/6 patients; 66.7%) (6). Since genital anomalies are listed as very characteristic features in many descriptions of the syndrome, the rather low frequency of occurrence in this studied cohort is a remarkable observation, to be con rmed by standardized data collection.
Multiple organ malformations and functions are repeatedly described in SLOS patients. These include heart defects, renal anomalies, lung anomalies, gastrointestinal abnormalities and brain structural anomalies [8]. These were also detected in the cohort studied, but only in very small numbers of patients, far below ≤ 40%. The highest frequency was still shown by the structural brain anomalies in 6/24 patients.
In the area of the musculoskeletal system, the feature (Muscular) hypotonia was documented in the cohort studied here, with an occurrence in 50% of cases. Balogh et al. also described this feature for 30.8% of patients [4]. In contrast, the rst describers of the syndrome Smith, Lemli and Opitz describe the occurrence of moderate muscular hypertonicity in all three patients [3] so that the features seem to be opposite. This may be explained by an observed transition of postnatal muscular hypotension into later hypertension [10]. It can be stated that the parameters determined as the most frequent features for SLOS on the basis of the current study were almost all documented in the published studies used for comparison with a percentage frequency of occurrence of ≥ 40%. Exceptions are the characteristics Recurrent infections and Hypotonia.

Conclusion
The aim of the study was to create data sets for three exemplarily selected rare diseases with the most important symptoms of these diseases leading to diagnosis in a standardized vocabulary. For this purpose, a literature review was conducted for late onset Pompe disease, Gaucher disease Type I and SLOS, followed by a retrospective examination of patient cohorts with regard to symptoms and diagnostic parameters. Using this methodology, 20 parameters presented in the form of HPO terms were identi ed as the most frequent features for late-onset Pompe disease, 16 for Gaucher disease Type I and 17 for SLOS. It was found that a retrospective analysis based on patient records makes standardization di cult due to missing data and different documentation. It can therefore be concluded that documentation using a template would have facilitated this. In turn, the developed data sets can be used to create this template. It is true that the patient cohorts for rare diseases were relatively large (Pompe: 23 patients, Gaucher: 21 patients, SLOS: 25 patients). However, the discussion conducted for SLOS indicates that there are features that are under-or over-represented in the current study compared to other publications. Thus, in order to obtain as complete and accurate a picture as possible, it is recommended that this study be supplemented with additional patient cases or cohorts to further expand and improve the data sets for CPMS. This and an application of the methodology to other diseases can improve the diagnosis and treatment of patients with rare diseases.

Ethics approval
The ethical committee of the Otto-von-Guericke university in Magdeburg approved the study.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.