RaDiCo, the French National Program on Rare Disease Cohorts

Background: Rare diseases (RDs) affect nearly 3 million people in France and at least 26-30 million people in Europe. These diseases, which represent a major medical concern, are mainly of genetic origin, often chronic, progressive, degenerative, life threatening and disabling, accounting for more than one third of all deaths occurring during infancy. In this context, there are needs for coordinated information on RDs at national /international levels, based on high quality, interoperable and sharable data. The main objective of the RaDiCo (Rare Disease Cohorts) program, coordinated by Inserm, was the development of RD e-cohorts via a national platform. The cohort projects were selected through a national call in 2014. The e-cohorts are supported by an interoperable platform, equivalent to an infrastructure, constructed on the "cloud computing" principle and in compliance with the European General Data Protection Regulation. It is dedicated to allow a continuous monitoring of data quality and consistency, in line with the French Health Data Hub. Results: Depending on cohorts, the objectives are to describe the natural history of the studied RD(s), establish phenotype-genotype correlations, decipher their pathophysiology, assess their societal and medico-economic impact, and/or identify patients eligible for new therapeutic approaches. Inclusion of prevalent and incident cases started at the end of 2016. As of April 2021, 5558 patients have been included within 13 RD e-cohorts covering 67 diseases integrated in 10 European Reference Networks and contributing to the European Joint Program on RDs. Several original results have been obtained in relation with the secondary objectives of the RaDiCo cohorts. They deal with discovery of new disease genes, assessment of treatment management, deciphering the underlying pathophysiological mechanisms, diagnostic approaches, genotype-phenotype relationships, development and validation of questionnaires relative to disease burden, or methodological aspects. Conclusion: RaDiCo currently hosts 13 RD e-cohorts on a sharable and interoperable platform constructed on the “cloud computing” principle. New RD e-cohorts at the European and international levels are targeted.

conditions have yet to be recognized. Thus, there are needs for high quality, interoperable and sustainable creation and monitoring of (inter)national cohorts of patients with rare diseases.
Coordinated care and research on RDs appear critical for patients and families, in order to better describe the natural history of the disease, to improve the diagnostic procedures, to decipher the underlying pathophysiological mechanisms and to better stratify patients for targeted clinical trials and treatments through personalized medical approaches. It is also expected that research on RDs will increase our understanding of the pathophysiology of common chronic diseases, as RDs often represent a "model of dysfunction" severely affecting a limited number of biological pathways.
Identifying the locks is essential. Yet, RD clinical management and research in France and Europe have been hampered by a lack of resources at several levels: few scientists work on only one given disease; few patients per disease and patients are scattered over large geographic areas, causing di culties in gathering data on their disease; existing databases as well as biological collections, when existing, are usually local, small, incomplete, not always quality-controlled, of heterogeneous formats and contents, and are rarely accessible or standardized for allowing interoperability; phenomes are often complex and partially described along with time, with insu cient interdisciplinary cooperation.
The need to promote networks of expertise for RDs in order to improve both RD clinical care and research has been considered a priority in France since 1995. Indeed, RDs occupy an important place in public health in France: Orphanet (an informational RD website and a directory of expert services, launched in 1997) [ 1 ] was created by the Health Ministry; the Alliance Maladies Rares (French federation of patients' organizations) was launched in February 2000 [ 2 ]; and the GIS (Groupement d'Intérêt Scienti que) Maladies Rares was created to fund research in the eld of RDs. The creation of the rst French National Rare Disease Plan (PNMR, "Plan National Maladies Rares") (PNMR1, [2005][2006][2007][2008] [ 3 ] allowed access to high quality care, and treatment was facilitated by the creation, at the national level, of RD Reference Centers (RDRCs) and Competence Centers. The PNMR2 (2011-2016) consolidated previous achievements, aiming at reinforcing national and international cooperation [ 4 ]. Twenty University genetic laboratories were equipped with the Next Generation Sequencing technology for clinical use. In 2014, the RDRCs and RD Competence Centers were grouped into 23 thematic RD Healthcare Networks (RDHNs or "Filières de Santé Maladies Rares"). This move anticipated the 23 RD European Reference Networks (ERNs) that were launched at the end of 2016. The RDHNs coordinate diagnosis, provision of health, social care and training; they collect healthcare data, develop research programs and write the National Protocols for Diagnosis and Care ("PNDS" standing for "Protocoles Nationaux de Diagnostic et de Soins").
One main objective of the PNMR3 (2018-2022) [ 5 ], in line with the International Rare Diseases Research Consortium (IRDIRC) [ 6 ] vision, was to provide an accurate diagnosis within a year of the rst specialty medical consultation. The PNMR3, which identi ed 109 RD Coordinating Centers (RDCCs) associated to 386 constitutive centers and 1840 competence centers, reinforced the links with European research initiatives on RDs. It also aims at strengthening the role of RaDiCo in integrating research data for RDs (Action 11.4).
A need for launching RD cohorts. In this fast-mutating context, RD professionals highlighted the critical need to implement nation-wide, multidisciplinary, high-quality cohort studies in order to address key scienti c and medico-economic questions. In this context, the request was high to get access to appropriate resources, methods and tools for collecting RD data at the (inter)national level. Given the speci cities of RDs -limited number of patients per country, scarcity of relevant knowledge and expertise, and fragmentation of research -they have been considered as a distinctive domain of very high national and European added value. A supporting research program was therefore required to provide essential information on disease history and characteristics, and to foster the identi cation of underlying molecular mechanisms, genotype/phenotype correlations. New knowledge would ultimately lead to better targeted care and treatments. The structuring boost given by the PNMR2 was considered as an opportunity to make collective efforts for building RD cohorts and therefore to propose the RaDiCo (Rare Disease Cohort) project to the national call on cohorts of the rst Investments for the Future Program The objective of the RaDiCo project was twofold. On the one hand, the scienti c objective was to set up several RD e-cohorts with the following aims according to the idiosyncrasy of each cohort: Describe the natural history of the targeted RDs; Establish genotype-phenotype correlations; Decipher the underlying pathophysiological mechanisms; Identify new therapeutic avenues; Estimate their societal and medicoeconomic impact; Identify patients eligible for new therapeutic approaches; De ne a methodological strategy of analysis for cohorts including both prevalent and incident cases (with respect to modeling and bias analyses).
On the other hand, to reach these scienti c objectives, another goal was to build a national operational platform, equivalent to an Infrastructure as a Service (IaaS), for implementing a potentially unlimited number of e-cohorts consisting of prevalent and incident cases.
Such cohort projects had to be closely articulated with the above-mentioned established networks on RDs. The PNMR2 also fostered the development of the National RD Data Bank (BNDMR standing for "Banque Nationale de Données Maladies Rares", Figure 1). The BNDMR was built as part of CEMARA [ 8 ] ("Centre Maladies Rares"), which started in 2007 at Necker Enfants-Malades Hospital (AP-HP). The BNDMR, which is dedicated to public health issues, aims at collecting general epidemiological and public health data on all patients with a RD in France, on the basis of a common RD minimum data set [ 9 ] and a unique identi er [ 10 ] for each RD patient.
The RD cohorts also had to integrate, whenever appropriate, non-French RD expert centers and patients, in order to overcome the low number of patients in one single country with respect to the sample sizes needed for proper statistical power, thereby anticipating the emergence of the future RD European Research Networks (ERNs).

Results
Thirty-three letters of intent were received after the publication of the RaDiCo call for RD cohorts (see Material and Methods). This call led to select 16 national and/or European RD cohort projects on July 15th, 2014. Among these, 3 have been discontinued, secondary to decisions of the Scienti c and Plenary committees of the RaDiCo program (see Material and Methods): after demand from the principal investigators (PIs) for two of them since the cohorts had not started, and another in 2019 since it could not start in due time. The groups of RDs targeted by the 13 current cohorts appear in Table 1.
The general framework of the RaDiCo cohorts is presented below (Table 2), with or without associated biocollections at each site, together with the planned inclusion and follow-up period. The start of the inclusions and the inclusion targets are mentioned. All cohorts are multicentric, mainly national but also European (SEDVasc, ECYSCO) and international (GenIDA). For all the cohorts, a total of 11,650 included patients are expected at the end of the inclusion period (July 2027). As of April 2021, 5558 patients had been included into 13 RD e-cohorts, covering 67 diseases from ~300 a liated expert centers ( Figure 2).
Each implementation step of the e-cohorts is presented in Figure 3. These cohorts were supported by a platform enabling information ow and communication, which has been set up in the framework of the RaDiCo program. This platform, which ful lls the necessary requirements of an Information System (IS), has been assessed by an independent external auditor. Moreover, it contributed to the design of the IS from the Inserm "France Cohorts" program designed to support not only RD e-cohorts, but also cohorts of patients with common multifactorial disorders, as well as population-based epidemiological cohorts.
Primary objectives of the RD cohorts are not yet achieved; however, several results have already been obtained in relation with the secondary objectives of several RaDiCo cohorts. They deal with discovery of new disease genes, assessment of treatment management, deciphering the pathophysiology and diagnostic approaches, genotype-phenotype relationships, development and validation of questionnaires relative to the diseases burden, or methodological aspects ( , whose mutations are responsible for primary ciliary dyskinesia (PCD) have been identi ed through molecular and cellular studies performed in the framework of the RaDiCo-DCP cohort. This cohort has been built on the deep phenotyping of the patients, which includes the ultrastructural defects of the microtubule-based structure of motile cilia and sperm agella (i.e. the ciliary and agellar axonemes).
These organelles contain several dynein arms, each of them consisting of multiprotein complexes that carry an ATPase activity required for ciliary/ agellar motility. Speci c phenotypes have been associated with mutations in these new genes that code for different classes of proteins. TTC12 is believed to be a co-chaperone involved in the cytoplasmic pre-assembly of dynein arms [11]; the lack of GAS2L2 causes PCD by impairing cilia orientation and mucociliary clearance [12], whereas DNAH9 encodes one of the axonemal dynein chains [13]. As for DNAJB13, it encodes an HSP40 family member involved in the proper building of the ciliary and agellar axoneme [14]. In addition, besides the identi cation of those new molecular causes of PCD, one of the key results obtained through functional studies performed on both patients' primary airway epithelial cells (AECs) and CRISPR-Cas9-edited human primary AECs is the existence of distinct dynein assembly mechanisms in human motile cilia versus agella [11]. As for the developmental eye defects reported in patients from the RaDiCo-AC-OEIL cohort, de novo missense variants have been identi ed in FBXW11, a gene that encodes an F-box protein involved in ubiquitination and proteosomal degradation [ 15 ].
Assessment of treatment management -From a therapeutic viewpoint, as shown in the RaDiCo-SEDVasc cohort, which is dedicated to patients with a rare genetic connective tissue disorder called vascular Ehlers-Danlos syndrome, the assessment of treatment management has revealed the impact of different therapies on morbidity and mortality. Indeed, in this disease condition due to mutations in COL3A, it has been shown that patients treated with celiprolol -a beta blocker-had a better survival than those not treated with celiprolol and that the observed reduction in mortality was dose-dependent Methodological aspects -Several tools for harmonizing our approaches in our own country and in Europe have been developed. We de ned a way to federating RD patients identities [10]; to develop Cerberus, an access control scheme for enforcing least privilege in patient cohort study platforms for the RaDiCo-GENIDA cohort [ 31 ]; to propose an overview of the current situation and experiences of the national RD registries in Europe: [ 32 ]; with other European colleagues to share recommendations for improving the quality of RD registries [ 33 , 34 ] the two latter papers were used for the design of our e-Health approach.
Moreover, RaDiCo is involved in the French PNMR3 and contributes to the EJP-RD program [ 35 ].

Discussion
The RaDiCo program is intended to promote, through the support of RD e-cohorts' projects, the collection of phenotypic data for epidemiological and clinical research purposes in connection with basic and translational research. To this end, we implemented an operational team (cf. Material and Methods section) with the mission to establish a centralized platform of expert services and tools to ensure installation and follow-up of the RD e-cohorts, via consulting services, for instance legal & regulatory services, or clinical research quality tools, development and provision of innovative information technology tools (electronic case report forms (eCRF), interoperability solutions, or access to e-health tools).
The preparatory period of the cohorts was slowed down by the uncontrollable delays required to obtain all the ethical and regulatory authorizations, as well as the evolving regulations both at the EU level, with the new European regulation on personal data safety and security (GDPR) [ 36 ], and at the national level, with modi cations of the law on research involving human subjects (French «Jardé law»)[ 37 ]. Moreover, a consortium agreement had to be signed between the partners for formalizing legal links between them, setting the modes of governance of the project on scienti c, strategic and operational plans, and outlining the scienti c valorization in terms of intellectual property, publications and citations of the project. A set of 19 Key Performance Indicators (KPIs) was implemented to enforce the cohorts' follow-up.
Once initiated, the 13 ongoing RD cohorts progressed appropriately and the rst results related to the secondary objectives of several cohorts have been published in international peer-reviewed journals.
The costs and management of informatics providers and Clinical Research Organization (CRO) are important. Public-Private Partnerships helped to build our sustainability plan. Legal contracts were set up for consortium and collaboration agreements, enabling data collection and sharing.
At the cohort level, we industrialized our production processes, as well as tooling methods such as standardized operating procedures or documents shared online. Recurring di culties are the shortage of manpower for data entry, the di culties in identifying cases in the different hospital information systems, or the legal barriers to collect data from deceased children. We implemented solutions through identifying key data and prioritization of the elds related to the main objectives to be completed by the clinical research technicians with support from the residents for lling the medical information. Clinical research technicians have been recruited by the RDHNs and speci c resources have been obtained through targeted research projects supported by Public-Private Partnerships or applications to speci c grants.
We explored the impact of RaDiCo for the medical community. Based on current information provided by all the cohorts' participating teams, the program should contribute to signi cant improvements in patients' care and outcome. Indeed, the design of the cohorts is made in such a way that it allows structured collection of disease symptoms at various stages of the pathological processes. Expected bene ts include better knowledge of the natural history of all the investigated RDs, recognition of relevant comorbidities, novel proposals for healthcare management, and, ultimately, better quality of life for the patients and their families. In line with the objectives speci cally de ned for each cohort, other expectations are: identi cation of relevant disease biomarkers for diagnosis, disease severity and exacerbations; patient selection for clinical and therapeutic trials; and production of novel quality of life questionnaires. The development of electronic health tools is also among the secondary objectives of several cohorts. These tools are designed to help patients to self-report their symptoms as well as their medical management, and ultimately to assist the health care providers to make appropriate changes to medication use.
Altogether, the cohort studies should help progressing towards a more personalized medicine, particularly for appropriate investigations and therapeutic strategies. They should contribute to decrease the burden of these chronic diseases and to improve their socio-economic impact.
The RD cohorts are multidisciplinary and, for most of them, include molecular diagnostics and research laboratories. Organization and standardization of biological sample collections (biobanks) are part of the program. Strong interactions and collaborations between clinicians and basic scientists are developed to identify genetic and environmental determinants in well-de ned groups of patients, to establish genotype /phenotype correlations, and to progress in the understanding of the underlying molecular and cellular mechanisms.
Rare diseases carry high morbidity and mortality; they represent a signi cant burden for the health care systems. Evaluation of the economic impact of RDs is therefore critical. This implicates a dedicated focus on the cost-effectiveness analyses, which evaluate both the costs and results of the health care systems and organizations applied in RDs. The nal goal is to appropriately allocate the health care resources to the RD patients all over the country. This implies that RD are adequately traceable in the national health information systems. In line with these needs, the Ministry of Health has set up a national bank for rare disease data (BNDMR). All the reference centers and their network of teams in France have the obligation to implement this data bank. Standardized and detailed patient information collected through the cohort programs support the production of economic indicators. Moreover, access to the

Conclusion
The RaDiCo program promotes the collection of RD phenotypic data of different types for epidemiological and clinical research purposes in connection with basic and translational research. The RaDiCo platform offers the cohorts an information system based on the cloud principle (Information as a Service) together with a common core of services and speci c procedures for some cohorts to ensure installation and follow-up of the RD e-cohorts. RaDiCo IS can drive a virtually unlimited number of RD ecohorts, a major strength compared to other IS that usually deal with only one cohort. Currently, 13 RD ecohorts are implemented and other national, European and international cohorts are targeted.

RaDiCo's national call for implementing RD cohorts
In 2014, RaDiCo launched a national call for RD Cohort proposals organized as a two-stage procedure (letter of intention / full application). Guidelines and templates for candidate cohorts were designed and made accessible on line to applicants. The full dossiers were evaluated by independent international experts (at least 3 per project), with pre-established evaluation criteria. The RaDiCo Scienti c Committee (SC) for the whole Program is composed of 18 members with complementary expertise associating coordinators of RDRC, RDHN, epidemiologists, biostatisticians, experts in Information Systems, molecular geneticists, directors of Inserm Research Units in the eld of RDs, and representatives of Inserm Thematic Institutes. Its role is to support the Executive Committee concerning strategic issues and scienti c governance of the program, including management of the call for projects and speci c follow-up of cohorts; and to contribute to the national call for RD cohort projects launched by RaDiCo in 2014.
An Executive Committee (EC) is shared by all the cohorts. It is composed of 4 members with complementary expertise (SG, AC, PL and SA). Its role is to ensure the implementation, development and monitoring of the entire program. Its main missions are operational management and deployment planning of the sustainability and internationalization of the RaDiCo platform, the links with each RD cohort, and a factorization of the shares. The EC is responsible for the scienti c management of the program.
The management of the scienti c program is implemented at different levels: by the Executive Committee, the Scienti c Committee, the Inserm Thematic Institutes of "Public health", "Technology for Health", "Genetics, Genomics and Bioinformatics", as well as by the 3 scienti c governance bodies of each RD cohort. Indeed, each cohort has its own governance, on a scheme comparable to that of the whole program, with a Plenary Committee, a Scienti c Committee and an Executive Committee. The RaDiCo platform uses exchange format and data security in compliance with the European directive on the General Data Protection (GDPR). The RaDiCo work plan for each cohort is structured to achieve four levels of interoperability: technological, semantic, syntactic, and institutional. This allows networking and optimizing the use of existing RD patient cohorts at the EU level, while allowing integration of new types of data and technologies. The data of each RaDiCo cohort and associated services have to be compliant with the FAIR principles (Findable, Accessible, Interoperable and Reusable) by both people and computers. An important step in the FAIR data approach is to publish existing and new datasets from RaDiCo cohorts in a semantically interoperable format that can be understood by computer systems.   • TTC12 loss-of function mutations cause primary ciliary dyskinesia and unveil distinct dynein assembly in motile cilia vs. agella [11] • Lack of GAS2L2 causes primary ciliary dyskinesia by impairing cilia orientation and mucociliary clearance [12] • Mutations in outer dynein arm heavy chain DNAH9 cause motile cilia defects and situs inversus [13] • Mutations in DNAJB13, encoding an HSP40 family member, cause primary ciliary dyskinesia and male infertility [14] • de novo missense variants in FBXW11, a gene that encodes an F-box protein involved in ubiquitination and proteosomal degradation [15] Assessment of treatment management • Vascular Ehlers-Danlos syndrome -Long-term observational study [16] Pathophysiology and diagnostic approaches • Accuracy of clinical diagnostic criteria for patients with vascular Ehlers-Danlos syndrome in a tertiary referral centre [17] • Functional assessment and phenotypic heterogeneity of SFTPA1 and SFTPA2 mutations in interstitial lung diseases and lung cancer [18] • Health-related quality of life in infants and children with interstitial lung disease [19] • Pulmonary brosis in children [20] • Chronic interstitial lung diseases in children: diagnosis approaches [21] • Pulmonary hemosiderosis in children with Down syndrome: a national experience [22] • Paediatric sarcoidosis [23] • Genetic causes and clinical management of pediatric interstitial lung diseases [24] Genotype-phenotype relationships • Infertility in an adult cohort with primary ciliary dyskinesia: phenotypegene association [25] • Primary ciliary dyskinesia gene contribution in Tunisia: Identi cation of a major Mediterranean allele [26] • Alport syndrome: a uni ed classi cation of genetic disorders of collagen IV α345 [27] • Genetics of anophthalmia and microphthalmia. Part 1: Non-syndromic anophthalmia/microphthalmia [28] Development and validation of burden questionnaires • Burden of albinism: development and validation of a burden assessment tool [29] • Burden of adult neuro bromatosis 1: development and validation of a burden assessment tool [30] Methodological aspects • Federating patients identities: the case of rare diseases [10] • Cerberus, an access control scheme for enforcing least privilege in patient cohort study platforms: [31] • National registries of rare diseases in Europe: an overview of current situation and experiences [32] • Recommendations for improving the quality of rare disease registries [33] • Data quality in rare diseases registries [34] Figures Figure 2 Geographical distribution of RDs centres contributing to the RaDiCo cohorts in France (blue) and other European countries (orange). Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.