FAIR, ethical, and coordinated data sharing for COVID-19 response: a review of COVID-19 data sharing platforms and registries

Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. In the context of COVID-19, there has been a rush to share data, marked by an explosion of population- and discipline-specic resources for collecting, curating, and disseminating participant-level data. We present a comprehensive overview of COVID-19-related platforms and registries that harmonize and share participant-level clinical, OMICs, and imaging data and metadata, and describe how these initiatives map to best practice for ethical, equitable, and FAIR management of data resources. Data sharing resources were concentrated in high income countries and siloed by comorbidity, body system, and data type. Resources for sharing clinical data were less FAIR than those for sharing OMICs or imaging data. We review gaps and redundancies in COVID-19 data sharing efforts and outline recommendations to build on existing synergies and align with frameworks for effective and equitable data reuse.


Introduction
There is myriad public health, ethical, economic, and scienti c arguments for collecting, harmonizing, and sharing public health-related, participant-level data from research studies, disease surveillance systems, and routine clinical care. These include fast tracking the development and evaluation of preventative measures, diagnostics, and treatments; avoiding the human and economic cost of unnecessary research; and more effectively distinguishing between clinically relevant and spurious sources of heterogeneity to optimize prevention and treatment measures for diverse populations. The urgency of the ongoing COVID-19 pandemic has foregrounded the importance of data sharing, in some cases, pitting collaborators or similar initiatives against one another in the quest for funding and data producers to support data sharing activities.
Many researchers share data through uploading their datasets to data lakes or dataverses, data storage and sharing resources, like GitHub, where the data are not harmonized at the participant level and there are few-to-no restrictions on the types of studies from which data can be shared. In this review, we focused on identifying and describing COVID-19-related platforms and registries, data sharing resources that conduct pro or retrospective harmonization of participantlevel data. In most cases, registries require data contributors to upload data using a shared case report form (CRF) while researchers can upload datasets with different data dictionaries to data sharing platforms. Both data sharing platforms and registries may limit eligibility to certain types of data or populations. Data sharing platforms generally represent greater investments because of the diverse inputs needed for retrospective harmonization, their focus on highrather than low-dimensional data types (e.g. OMICs, including human and pathogen genomic data, human metabolomic data, etc., and imaging rather than clinical data), and more expansive inclusion criteria, which allow for the collection of a greater volume and diversity of data (see Supplementary Table 1 for working de nitions of data sharing resources).
Collecting participant-level data and descriptive metadata and harmonizing and sharing participant-level data are resource intensive activities that require expertise in physiology, diagnostics, the trajectory and etiology of infection, risk factors and comorbidities, standards for the interoperability of meta-and participant-level data, harmonization, data sharing-related laws, research ethics, and community engagement. In addition to concerns about maximizing data sharing investments through fostering the interoperability of related platforms and registries, the rush to facilitate COVID-19-related data sharing through the extension of existing platforms and the establishment of novel registries raises a number of questions related to how data sharing efforts map to the FAIR principles for data resources 1 and best practice for the ethical reuse of participant-level data 2,3 .
To explore these and other questions, we collected data on a number of domains of interest for evaluating how resources for collecting, harmonizing, and sharing participant-level COVID-19 data and related metadata correspond to frameworks for public health-related data sharing, including the Global Health

Security Initiative and Global Research Collaboration for Infectious Disease Preparedness (GloPID-R) Principles of Sharing Data in Public Health
Emergencies 4 , COVID-19 National Core Studies (NCS)Data Sharing Principles 5 , Global Alliance for Genomics and Health (GA4GH) Framework for responsible sharing of genomic and health-related data 6 , and the CARE Principles for Indigenous Data Governance 7 .

Methods
We conducted a monthly search of Google and Google Scholar between May 2020 and June 2021 using text terms for COVID-19 and for data sharing resources (Supplementary Note 1) to identify relevant platforms and registries that collect, harmonize, and share COVID-19-related participant-level clinical, human or pathogen OMICs, and high dimensional imaging data. To account for English-language bias in the search strategy, we contacted investigators that work on COVID-19-related data sharing in Asia, Africa, and Latin America and applied natural language processing (NLP) to the Covid- 19 Open Research We consulted with end users of harmonized, participant-level COVID-19 data from different elds to identify information that would be useful for them to evaluate the utility of different data sharing resources. We collected general information on the resource (e.g. lead organization, location, funding), linkages between data types at the participant-level, resource metrics for success (e.g. number of dataset uploads and downloads), criteria used to evaluate resource adherence to the FAIR principles and the outcomes of those evaluations, data access mechanisms and governance structure, deidenti cation of data, ethics review and broad consent related requirements, community engagement, and bene t sharing with data contributors and source communities. We developed a REDCap 9 questionnaire (Supplementary Note 3) to collect required information and distributed the survey to 31 data platform or registry teams where required information was not provided on the resource website. Following four months of bimonthly reminders, 18 of the 31 data sharing resources we contacted completed the online survey. Where survey responses contrasted with information available online, we used the survey data.
How FAIRness is evaluated depends on the data type and community-speci c needs and preferences. When the FAIR principles were rst published 1 , they were necessarily aspirant and vague. Over time, different interpretations and extensions of the principles have developed, alongside a number of assessment tools (listed at https://fairassist.org). We conducted a qualitative evaluation of registry and platform adherence to the FAIR principles using four basic criteria: (1) whether the resource was discoverable via a persistent identi er (PID); (2) whether information on how to access data was available on the resource website; (3) whether the resource implemented a community-developed standard for participant-level data or metadata; and (4) whether the resource speci ed a data usage license or agreement. We conducted a quantitative evaluation of how registries for sharing participant-level, clinical data align with the FAIR principles through applying the FAIRshake algorithm 10 to a set of criteria that we identi ed as most important for evaluating the utility of these resources (Supplementary Note 4). We adapted existing criteria for our speci c use case through a combination of a manual review of the FAIR maturity indicators 11 and the Research Data Alliance (RDA) FAIR Data Maturity model output 12 and a review of the algorithms used by semi-automated tools, including FAIRshake 10 , FAIR evaluator 11 , and FAIR-checker 13 . All gures were created in Tableau Desktop 2021.2 with the exception of Figure 2, produced through open source code on FAIRsharing.org.

Data availability
The dataset describing the 68 platforms and registries that collect, harmonize, and share COVID-19 related participant-level clinical, OMICs, and/or imaging data and an additional 13 meta repositories that share or otherwise facilitate access to COVID-19-related datasets or data sharing resources is available for comment on Zenodo (https://zenodo.org/record/5101817#.YRMHx44zaUk). Twenty-eight of these resources, which provide information on registry contacts, license, support information or data accessibility conditions, and where the data, although not public, are accessible on request and/or published in a scienti c paper, and/or shared as report or dashboard were further described in FAIRsharing 14 , a global resource that interlinks databases, standards and policies.
Metadata for these resources are available on FAIRsharing via a dedicated collection: https://fairsharing.org/collection/TDRCOVID19Participantleveldatasharingplatformsregistries

Results
We identi ed 47 registries and 21 platforms that collected, harmonized, and, in some cases, shared participant-level COVID-19 human subjects' data. All but two of these were identi ed through the monthly Google searches rather than the NLP approach (see Supplementary Table 2 for citations identi ed through the NLP strategy). COVID-19 data sharing resources were overwhelmingly data type speci c. Almost all registries (45 of 47) were limited to clinical data; two included clinical and high-dimensional imaging data. Eleven of the 21 platforms included OMICs data and six included high dimensional imaging data (e.g. CT scans). Nine platforms included more than one data type.
Long COVID affects approximately 30-87% of adults 15,16 and 5-8% of children 17,18 who are infected with SARS-CoV-2 and harmonized, longitudinal datasets with linked clinical, human and pathogen OMICs, and imaging data may facilitate long COVID-related prognosis and treatment and help identify participantlevel factors correlated with the emergence of variants of concern (VOC) and VOC-related differences in etiology and vaccine e cacy. Supplementary Figure 1 shows registry-and platform-speci c participant-level linkages between data types. About a third of platforms (N=9) and half of the registries (N=25) included longitudinal clinical data. While no registries included human or pathogen OMICs data, six platforms included longitudinal human OMICs data; two of those six also included longitudinal imaging data. Four platforms, (CanCOGeN, N3C, dbGAP, ReCoDID), included linked longitudinal clinical and human and pathogen OMICs data. Only one platform (N3C) included longitudinal data on all datatypes, including clinical, host and pathogen OMICs, and high dimensional imaging data.
Most registries (N=44), but only a few platforms (N=4) limited data to populations with a particular coinfection, comorbidity, assessment, treatment, or outcome of interest. There were several instances of registries that covered the same comorbidities, including six registries for different forms of cancer, four registries for blood conditions, four registries related to cardiovascular system diseases, seven registries for skin conditions, three registries for rheumatic disease, three registries for issues related to the digestive system, two registries for liver disease, two registries for neurological conditions, and two registries for diabetes. An additional three registries were limited to individuals with kidney disease, multiple sclerosis, and patients receiving extracorporeal membrane oxygenation (ECMO). Several registries collected data on pediatric (N=4) or pregnant (N=2) populations. Of those, two registries included data on pediatric cancer patients and one on pediatric patients with rheumatic disease. All platforms and two-thirds of registries (N=31) included data from participants of all ages. Nine registries were limited to data on adults aged 18 and over. Supplementary Figures 2 & 3 show the global distribution of platforms and registries for sharing participant-level, COVID-19-related data. One third of platforms (N=7) and 59% of registries (N=28) were based in the US; 57% of platforms (N=12) and 36% of registries (N=17) were based in Europe; and one registry was based in Brazil and in Israel, respectively. Most platforms and registries (N=17, 71%, N=28, 59%, respectively) accepted data from any country; six platforms (28%) and 19 registries (40%) were country or region speci c.
For resources that collected clinical data, most registries (N=43; 91%), but only one platform, were limited to prospective harmonization of participant-level data through a shared electronic case report form (eCRF). Four platforms (19%) and two registries (4%) conducted both prospective and retrospective harmonization; one registry for clinical data conducted only retrospective harmonization (ACR CIRR). Most registries and platforms that included prospective harmonization of clinical data provided a REDCap-based eCRF (29 of 54 platforms and registries; 54%). Other registries and platforms used Qualtrics (N=2), SurveyMonkey (N=2), OpenApp (N=2), or QMENTA (N=1) data capture software. At the beginning of the epidemic, the World Health Organization (WHO), International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC), and Infectious Diseases Data Observatory (IDDO) created an open access series of REDCap-based eCRFs which applied CDISC's SDTM standards (https://doi.org/10.25504/FAIRsharing.s51qk5). Of the 54 resources that included clinical data, only one platform (IDDO) and registry (CAPACITY) reported using the WHO/ISARIC/IDDO eCRFs, which represents a missed opportunity for prospective harmonization in COVID-19 response.
We present an overview of how COVID-19-related resources for collecting, harmonizing, and sharing participant-level data map to the FAIR principles and best practice for ethical and equitable data sharing in Table 2 (see Supplementary Table 3 for related text from each set of principles). The FAIR principles focus on the machine-actionability of data and related metadata, ndability, accessibility, interoperability, and reusability 1 . The quanti cation of how registries for sharing participant-level clinical data map to the FAIR criteria is presented in Supplementary Table 3 with the important caveat that quantitative evaluations are at an exploratory stage. The community continues to work on harmonizing the algorithms used to evaluate the application of FAIR indicators across disciplines as many tools for quantifying FAIRness yield divergent results. Therefore, we focus our discussion on the results of the qualitative evaluation of FAIRness, using the four main criteria described earlier. We considered resources that met none or one of the criteria as not very FAIR and resources that met two or more criteria as FAIR enough. As shown in Figure 1a, platforms were generally more closely aligned with the FAIR principles than registries and resources that were comorbidity or population speci c. As indicated in Figure 1B, registries and platforms for harmonization of clinical or epidemiological data were much less FAIR than those that included high dimensional data types.
Our evaluation, which included the registration and curation of eligible resources identi ed through our search in FAIRsharing.org, improved the FAIRness of a number of platforms and registries, through improving discoverability and availability of descriptive metadata. Speci cally, a digital object identi er (DOI) was assigned to two platforms and 18 registries, which did not have a PID; which is central to ndability. Additionally, we recorded information on the data accessibility mechanism and terms of use, elements essential to accessibility on Fairsharing.org. Lastly, we collected and recorded information about the data and metadata standards used by these resources; standards are fundamental to interoperability and reusability.
Community-developed standards, which include minimal information reporting requirements, terminologies, models and formats, are essential to structure the data in an unambiguous manner for humans and machines. Standards are more clearly de ned and widely used for high dimensional data types where machine readable metadata are de ned as part of the data capture (e.g. DICOM standards for imaging data) than for clinical data which then relates to the comparable FAIRness of resources for sharing OMICs and imaging versus clinical-epidemiological data. Five of the 47 clinical data registries used an eCRF that mapped to internationally accepted standards for clinical data, including International Classi cation of Diseases (ICD)-10 codes (n=2), Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT; n=1), Critical Care Data Dictionary (C2D2; n=1), Current Procedural Terminology (CPT) codes (n=1), Uni ed Medical Language System controlled unique identi er (UMLS CUI; n=1), and Anatomical Therapeutic Chemical (ATC) classi cation (n=1). Out of the platforms that share OMICs data, we only identi ed four that map to internationally accepted metadata or participant-level data standards. Out of these, two follow Minimum Information About a Microarray Experiment (MIAME) standards (Immport, GEO). Shared community-developed standards for participantlevel data facilitate cross-resource analyses. Figure 2 shows the relationships between the 28 resources for sharing participant-level COVID-19 data included in the FAIRsharing Collection (https://fairsharing.org/collection/TDRCOVID19Participantleveldatasharingplatformsregistries), depicted as orange circles based on their implementation of community-developed standards, or shared eCRFs (as yellow shapes) or through explicit connections where participant-level data or metadata from one platform is recorded in another platform. The zoomed-in elements show the standards used by EGA (top) and by ENA database (bottom). In several instances, platforms for sharing high dimensional human or pathogen OMICs data or imaging data accept data in any community developed standard; Figure 2 includes a subset of the standards used in those cases.
While all but one platform (COVID-19 and MS) planned to share participant-level data, 18 of the 47 registries did not intend to share COVID-19-related, participant-level data. Four of the 24 registries and 3 of the 19 platforms that intended to share data did not provide any information on how to access data on their website. Additionally, 6 registries that intended to share data provided insu cient information on how to access data on their website (e.g. ACR CIRR, BADBIR, ENERGY, HOPE-2, COVI-PREG, and PsoProtect). Fourteen of the 24 registries (58%) and 10 of the 19 platforms (53%) that intended to share data did not have a data usage license or agreement mentioned on their website.
Seven platforms (33%) and four registries (8%) met all four of the criteria for FAIRness while four platforms (19%) and 13 registries (28%) met none of the criteria. Platforms that met all of the criteria for FAIRness were large, government-funded platforms that pre-existed COVID-19. The four registries that met all four criteria for FAIRness (ASCO, SCCM, LEOSS, ELSO) limited submission to member institutions. Registries and platforms that met none of the criteria for FAIRness were recently launched, COVID-19-speci c resources that are still accepting data and may develop in terms of infrastructure and governance for COVID-19 data collection and sharing in the future. The application of a community-developed standard for participant-level data or metadata was the most commonly missed component of FAIRness, only two platforms and ve registries that collected participant-level clinical data adopted a community-developed standard for that data, which is di cult to address retrospectively and limits the interoperability of COVID-19 data sharing efforts.
The protection of human subjects, the governance of and mechanism for data sharing, and engaging in meaningful bene t sharing with the research team that contributed data and the participants' source community are of central importance for ethical data sharing 2 . One platform (4%) and 18 registries (38%) did not plan to share participant-level data. For three of the 19 platforms (16%) and 4 of the 24 registries (17%) that had or intended to share data, we were not able to identify the data access mechanism through the website. Three of the 10 platforms that shared or intended to share human OMICs data were open access, two others had both open access and private data for which access was controlled by the data generators or a data access committee (DAC), one required DAC permission to access the data, two others required registration to access the data, and the other two did not have data access information on their website. Seventeen registries and ve platforms that shared clinical data, ve platforms that shared human OMICs data, and two platforms that shared linked clinical and human or pathogen OMICs or imaging data included a DAC to review data requests. Requests for data access for ve platforms and one registry were decided by the data contributors.
Close to half of registries (N=20; 42%) stated that they were exempt from ethics review committee (ERC) oversight because they only collected de-identi ed data. While eight platforms only collected de-identi ed data, no platforms claimed ERC exemption. One platform and three registries stated that they would only accept participant-level data from groups that include broad consent for future use in their informed consent forms or have obtained waiver of consent.
Other than disseminating aggregate ndings through a data dashboard (N=16 or 34% of registries; N=2; 10% of platforms), 19 platforms and 29 registries mentioned other forms of bene t sharing, including citation of the groups that provided data (9 platforms, 2 registries), citation of the data sharing resource itself (16 platforms, 9 registries), acknowledgement of data providers or co-authorship on registry/platform-based publications (21 registries, 1 platform), or access to analytic tools (9 platforms, 1 registry). Only 9 registries, 4 platforms, 2 catalogue of platforms, and 1 federation of interoperable datasets and platforms mentioned any form of community engagement. Community engagement activities included: community forums to guide the overall direction of the platform ( There are no clearly de ned metrics for determining whether a platform or registry is "successful." Data sharing resources reported the number of collaborating centers, datasets, participants represented by those datasets, and SARS-CoV-2 genome sequences, registered users and views or downloads of datasets to describe the breadth of data collection and dissemination work. Six registries and one platform (ReCoDID) did not include any information that could be used to characterize data submission or reuse. Two of those registries (ACS COVID-19 Registry, Transthoracic Echocardiography in COVID 19 Registry) did not specify if they would share data and the platform was not yet accepting data. Platforms and registries require signi cant initial and ongoing investments and the sustainability of data sharing resources is a major concern. Eleven registries and one platform received funding from more than one source. Eighteen platforms and one registry received government funding; two platforms and 23 registries received funding from related professional organizations or NGOs; and two platforms and seven registries received funding from industry sponsors. While some registries received funding from universities (N=12) and private donations (N=4), no platforms were funded by these sources.

Discussion
In this manuscript, we present the results of a year-long initiative by members of the COVID-19 Clinical Research Coalition to understand how participant-level data are being shared for COVID-19 response. In addition to monthly searches, we applied NLP to the CORD-19 database and consulted with colleagues that work on sharing human or pathogen OMICs data or clinical data in Europe, Canada, Africa, Latin America, and Asia to identify resources for collecting, harmonizing, and sharing, COVID-19-related participant-level data. We identi ed 68 platforms and registries for collecting, harmonizing, and sometimes sharing different types of COVID-19 data. For close to half of these, information that could be used to evaluate resource FAIRness or governance practices was not available on the website or in related documentation and was collected through our online survey. While we expect that these responses would still be current, existing resources have continued to evolve and additional data sharing registries or platforms that harmonize participant-level data have continued to develop since we completed our search in June 2021. Because the relevant data from registries are generally not machine readable, continuously updating and curating the results requires an important investment of time. A brief search conducted in October 2021 suggested that an additional 50 registries, including a number of vaccine registries, had been launched.
How do COVID-19 data sharing resources respond to existing data sharing principles?
While the importance of leveraging existing participant-level data and of connecting different data types at the participant level for COVID-19 response cannot be overstated, more resources for data sharing does not mean better data sharing. A number of groups developed principles for sharing different types of human research data prior to or during the COVID-19 pandemic. Below, we review how COVID-19 data sharing platforms and registries map to the crossframework principles of: collaboration; adherence to the FAIR principles; ethical issues, including transparent governance, protection of sensitive data, and community engagement; compliance with data protection laws; and evaluation of platform utility. The correspondence of data sharing resources to established principles is summarized in Table 2. Commonly shared challenges and recommendations for coordinated data sharing for COVID-19 response are presented in Table 3; stakeholder-speci c recommendations are summarized in Table 4.

Collaboration
Data siloed by data type, comorbidities The siloing of data by data type, comorbidity, and treatment increases the time required for sharing data when individuals with multiple comorbidities need to be entered into multiple databases and ultimately diminishes the utility of the data. The existing universe of disease-speci c registries may lead to the exclusion of important populations affected by multiple comorbidities. Only a few of the 68 resources included clinical data that were linked to human and pathogen OMICs data at the participant level, which hinders efforts to respond to emerging and established VOCs. In contrast to prior epidemics of emerging pathogens, the partnership between IDDO, ISARIC, and WHO, resulted in the rapid publication of a series of REDCapbased eCRFs that apply CDISC SDTM standards. Other than IDDO itself, only one of the 54 data sharing resources that collect COVID-19-related clinical data reported using the IDDO-ISARIC-WHO eCRFs, which represents a missed opportunity. There have been national efforts to facilitate the interoperability of COVID-19 data, including the US National Coordinator for Health Information Technology Logica COVID-19 Implementation Guide 19 which applies a Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR)-based library of COVID-19-related data elements and the UK NHS COVID-19 National Clinical Coding Standards. 20 Emerging international efforts, like the COVID-19 Interoperability Alliance, 21 which includes SNOMED/LOINC and RxNorm and addresses cross national interoperability of COVID-19 data, and the HL7 International Patient Summary Implementation Guide 22 and the European Health Data Space initiatives 23 have emerged to address cross national interoperability of electronic medical record (EMR) data.
Need for connections between research and clinical data streams Selection bias, when the participants included in a study or database differ systematically from the population of interest, is an important consideration when accessing data uploaded to the platforms and registries described here. EMR data are an underutilized resource for surveillance and epidemic response 24,25 and represent a less selected population than the populations re ected in data that are manually entered data by hospital staff in disease-speci c registries.
Formidable barriers, including lack of interoperability, ethical concerns, and EMR-vendor or hospital-speci c barriers to access 26 , have prevented coordinated sharing of EMR data and likely led to the current universe of comorbidity-and population-speci c registries. In addition to reducing the data entry burden incurred when data are shared through registries rather than EMR, there are compelling ethical arguments, including the duty of easy rescue, for using EMR data in the public health response to epidemics 27 and several ongoing initiatives to facilitate cross-national, interoperable EMR data 23,28-30 .

FAIR principles and community standards
Our results show that platforms for sharing high dimensional participant-level health-related data (i.e. OMICS and imaging data) are better aligned with the FAIR principles than registries for sharing clinical or epidemiological data. This difference is explained in part because of the inclusion of machine-readable metadata and community-developed standards for participant-level data as part of the computational processing of high dimensional data, discipline-speci c expectations regarding data availability and the use of community-developed standards, and limited regulatory oversight for observational health research.
Registries for participant-level clinical data were less likely to be assessed for their adherence to the FAIR principles prior to the COVID-19 pandemic and how to measure the FAIRness of clinical or epidemiological data is an actively evolving conversation. The FAIR principles focus on machine readability and (re)use of data at scale; they do not address the quality and utility of the data resource and its content. The FAIR community continues work towards nalizing crossdisciplinary, cross-data type maturity indicators that can be implemented by any evaluation tool in order to yield consistent results and our evaluation of resource adherence to the FAIR principles should be read in the context of this evolving landscape. Funders can build on this initial evaluation of clinical research data sharing efforts by bringing together different stakeholders and disciplines to develop indicators to benchmark COVID-19 data sharing initiatives move towards FAIR data.
Seventeen of the 68 COVID-19-focused platforms and registries met none of the four basic criteria for FAIRness, which suggests a need to support those groups to enact basic steps to improve the platform or repository's adherence to the FAIR principles. The application of a community-developed standard for meta-or participant-level data is the most resource intensive and was the least commonly enacted of the four criteria. The role of data and metadata standards as essential elements for the consistent and meaningful reporting and sharing of information precedes the FAIR principles and their patchy implementation and use is a known issue 14,31 . Key challenges for interoperable clinical or epidemiological participant-level data and metadata include: (1) fragmentation with gaps and duplications and a lack of intra standard interoperability, which limits their consistent use, especially between medical and research areas; (2) differences in the governance and terms of use, especially between formal standard organizations and grass-root initiatives, which often limits contributions, extensions and modi cations; and (3) lack of funds to implement the standards for participant-level data, train users, curate data, and support the standards life cycle, which is necessary to deal with the evolving technologies and emerging data types; and (4) a lack of standards for study metadata. In this analysis, we were not able to directly measure the uptake of community-developed standards by data resources and had to collect information on resource adoption of standards through an online survey. This snapshot of the standards landscape, which will continue to evolve on FAIRsharing.org, should facilitate conversations about the wider adoption of common standards and the need for cross-standard interoperability.

Cross-registry interoperability of participant-level data
The use of community-developed standards for participant level data and study metadata is an important precondition for interoperable data. The use of different community-developed standards for participant-level data is likely unavoidable and may be addressed retrospectively, as through the application of the Observational Medical Outcome Partnership common data model (OMOP CDM) 32 . That said, very few platforms or registries applied communitydeveloped standards for participant-level data, further limiting the interoperability of these data sharing initiatives.
Comprehensive, machine-readable study and data sharing resource metadata are the rst step toward interoperability. Funders may consider extending ongoing efforts to develop guidelines for user-de ned metadata, 33 with a focus on clinical metadata, where, in contrast to OMICs and high dimensional imaging data, key metadata are not de ned at data capture. Interoperability of platform metadata and the application of shared standards for participant level data would represent important progress towards inter-platform or repository interoperability.
Ethical concerns & compliance with data protection laws Ethical or governance related concerns must be addressed. There are several disparate frameworks for evaluating ethical concerns when sharing participantlevel research and EMR data in the research response to a public health emergency. While there is general agreement that broad consent for future use should be sought when sharing de-identi ed EMR or research data 2 , some groups argue that broad consent, and even informed consent, are not needed for sharing de-identi ed data 34,35 . Where broad consent for future use was not possible or sought, a waiver of consent may be granted for sharing participant-level data in keeping with the Council for International Organizations of Medical Sciences (CIOMS) guidance 36 . Most countries have legal frameworks for sharing participant level data in the public health response to an emergency, like the COVID-19 pandemic, irrespective of consent 37 .
Most platforms and registries speci ed that they would only share de-identi ed data; seven of these platforms or registries indicated that they were exempt from ethical review because they were only sharing de-identi ed data. Maintaining data utility while preventing re-identi cation is an important challenge, especially in COVID-19 response where participant-level linkages between data types (i.e., pathogen and host OMICs data and clinical data) are important for detecting and responding to VOCs. Different de nitions of what anonymized and pseudonymized data mean further complicate cross-initiative discussions and approaches 38 . Data sharing resources should consider establishing an independent ethics advisory committee, as distinct from a research ethics committee, that re ects community values and preferences for data sharing and can evaluate key ethical issues. Interoperable governance, consistent de nitions, and common approaches to shared ethical and legal issues would both conserve scarce resources and facilitate explicit connections between related data sharing investments.

Equitable distribution of platforms
Multiple groups have highlighted the dangers of parachute research in the context of data sharing 39,40 and indicated that data sharing is perceived as widening existing disparities in access to funding and publication opportunities between researchers in high and low-and-middle income countries (LMIC) 41 . Platforms, in particular, represent long-term, signi cant investments in infrastructure and specialized expertise and the absence of data sharing platforms in LMIC represent a missed opportunity to support equitable, global data sharing for COVID-19 response.

Community engagement & bene t sharing
Resources that collect, harmonize, and share data have to be responsive to competing needs from a diversity of stakeholders, including data generating groups, research participants and their source communities, funders, end users, whether academic or commercial, the general public, and the Open Science Community. Community engagement is important for ethical data use and ensuring meaningful bene t sharing. When conducted properly, community engagement engenders trust, fosters understanding and ownership, and promotes the partnerships with communities that can support both data sharing and future research. The most frequently reported forms of bene t sharing were data dashboards and citation of the data contributing groups. Bene t sharing could also be in the form of documentation of data sharing-facilitated knowledge translation that could empower governments, the medical community, or the general public to take early action during a pandemic. Fewer than a quarter of registries and platforms reported engaging communities or investment in research capacity building.

Transparent governance
Data access models correspond to different political, ethical, administrative, regulatory, and legal contexts, resulting in different systems for the review and assessment of proposals to access the data. A common system to manage access involves review of an application to access the data by a centralized Data Access Committee (DAC). DACs review and evaluate proposals to access data and are central to ensuring that community values and preferences are re ected in data sharing decision making and setting public health priorities for data reuse. Independent commissions rather than individual researchers should be responsible for ensuring fair and equitable data sharing that balances the interests of data providers (e.g. publication), research participants or patients, and the open science and public health communities. Sixteen registries and ve platforms that are or will share participant-level data included a DAC.
Several recent reviews explore best practice for DACs 3,6,42,43 , which include, at a minimum, community representation, transparency and consistency regarding the process, criteria, and decisions around data requests and speci c steps to avoid con icts of interest between DAC members and dataset applicants.
Further work is needed to de ne best practice for data governance with a focus on interoperable governance of data sharing efforts when responding to PHEICs. In public health emergencies, software approaches to shielded data access (e.g. DataSHIELD 44,45 ), which allow for analysis without end users moving or "seeing" the data, may be a way to address ethical and legal concerns while ensuring timely data access for informed public health response.

Legal barriers to data sharing
Concerns about recent data protection laws, including GDPR, are likely correlated to siloed data and governance efforts, as when a platform deputizes individual institutions to manage data access rather than pooling responsibilities arising from data protection law, incl. establishing a centralized DAC to avoid distributed controllership. Lack of clarity in terminology 38 has contributed to inconsistent interpretations and applications of data protection laws within and beyond Europe which further hinder the interoperability of governance structures and initiatives that share interconnected data types. These fears have persisted in spite of provisions to support data sharing in the response to public health emergencies 46 , including article 9(2)i of the GDPR which allows for the processing of sensitive personal data for reasons of public interest in the area of public health, including protection against serious cross-border threats to health, and Art. 49(1)d GDPR which provides an exemption for international data transfers if these are necessary for important reasons of public interest, which in practice became to include the public health response to infectious diseases.
Many countries lack national legal frameworks related to the cross-border transmission and transfer and sharing of participant-level, health-related data. As for the GDPR, its scope of application is broad and often results in the requirement for research entities in countries outside of Europe to comply with GDPR when interacting with EU-based institutions as when submitting, accessing, or receiving participant-level health data. The application of GDPR to the data processing activities of international organizations actively contributing to health research is contested. However, if EU-based organizations share data with international organizations, they must check the level of data protection within these organizations as this should be essentially equivalent with GDPR-level protection. Thus, besides the scope of application, transfer rules also quickly extend the reach of GDPR making it, on the practical level, the default data protection legislation. Additionally, collision rules are unclear when legal frameworks that prescribe data governance interact across national boundaries which leads to confusion regarding which rule to apply to the same data or a jointly conducted research activity and may further hinder data sharing.

Quantifying data resource utility
The public health imperative to share data to improve COVID-19 prevention and response has led to a proliferation of data sharing platforms and registries.
There is a real need to understand the return on investment for these data sharing initiatives and to inform strategies to maximize the utility and sustainability of existing initiatives. While there have been a number of case studies that seek to demonstrate the utility of data sharing platforms, efforts to describe the public health-related bene ts of sharing harmonized participant-level health-related data has been largely qualitative. Future research could identify markers for contributions to and usage of data sharing platforms and how the harmonization and dissemination of data facilitate research translation, build scienti c networks, and lead to new elds of inquiry. In addition to understanding the utility of data sharing initiatives, clear metrics and quantitative approaches to assessing the downstream bene ts and harms of data sharing could facilitate an exploration of ethical issues like whether data generated by researchers in LMIC bene t communities in LMIC and whether data contributors receive some measurable bene t in terms of novel funding applications, publications, collaborations, or research directions, from data sharing and producing the metadata needed to appropriately interpret that data.

Identifying and supporting successful investments
Platforms, and to a lesser extent, registries, require a signi cant investment of money and time. For example, the IDDO platform began with the World Wide Malaria Network in 2004 and an initial investment of over 20 million USD 47 . Investments in developing the governance and infrastructure for platforms that pre-existed the pandemic helped them transition rapidly to COVID-19 data collection. While established platforms, like IDDO, have shared data on close to 500K participants 48 , COVID-19 platforms which were created during (e.g., CanCOGeN HostSeq and VirusSeq Portal) or slightly before the pandemic (ReCoDID) were not yet sharing data in July, 2021, when the platforms and registries overview dataset was nalized. Understanding which data sharing resources are "successful" in collecting and sharing data is as important as understanding how resources map to the FAIR principles and to best practice for ethical considerations related to international data sharing. We documented a number of metrics for evaluating the utility of data sharing resources, including the number of datasets, participants, genome sequences, and users. Future research should consider more nuanced measures of the impact of data sharing platforms and registries on preventing unnecessary research, improving the conduct of RCTs, and fast-tracking new discoveries or changes to clinical practice.

Coordinated data sharing for COVID-19 response
Collaboration between data sharing efforts, with a focus on the interoperability of related platforms based on interoperability of participant-level data and metadata through shared use of community developed standards, is perhaps the most important area for investment. The aggregation of standardized data across interoperable platforms or registries would help move towards the types of shared global analyses that could meaningfully inform the response to a global pandemic. The application of the same or interoperable standards for related study-and participant-level data is a necessary, but insu cient condition for inter-platform interoperability. In a few instances, connected platforms mean that data uploaded to one platform are re ected in another platform (e.g.

SARS-CoV-2 OMICs data uploaded to the EMBL-EBI COVID-19 Data Portal or NCBI is included in INSDC)
, which enhances data ndability and reuse. Large initiatives have emerged to connect platforms and registries within countries and regions, including the Health Data Research UK Innovation Gateway and the European COVID-19 Data Portal. Several initiatives exist to catalogue both COVID-19 data sharing initiatives and datasets (e.g. FAIRsharing; covid19dataindex). Coordination of COVID-19 clinical data sharing initiatives should include: the identi cation of several core CDMs which can be meaningfully applied to research and EMR data, best practice for governance and addressing ethical and legal concerns, which can form the basis of an interoperable governance structure and common approach, where possible, to shared ethical and legal issues, and improved technical approaches for querying related data shared on disparate platforms or registries, including shielded approaches where participant-level data can be analyzed without being downloaded from the platform. Interoperability focused initiatives that focus on improving access to FAIR clinical and human and pathogen OMICs and high dimensional imaging data should be prioritized to facilitate the global response to VOCs.

Conclusion
Public health emergencies remind the public health and scienti c communities of the urgent need to address unresolved barriers to sharing data in the context of infectious disease outbreaks. In contrast to the Zika and Ebola virus outbreaks, COVID-19 has ushered in a new era where researchers and funders need to shift their focus from supporting data sharing to promoting coordination between data sharing activities. The data sharing community, including funders, researchers, hospital networks, and public health authorities, need to move from a reactionary, fragmented response to a coordinated, synergistic approach.
Ensuring that data sharing resources are as FAIR as possible and best practice for resource governance, transparency, community engagement, applicable legal frameworks, and recommended ethical (e.g. protection of research subjects, ERC review) and equitable practice (e.g. bene t sharing, community engagement) continues to be a key concern. In particular, interoperability within and between types (e.g., clinical, laboratory, OMICs) and sources (e.g. EMR, research study) of data should be a top priority for the current and future epidemics. Cloud-based platforms for data sharing represent a tremendous investment of nancial resources and expertise. Clearly elaborated criteria for identifying successful platforms that apply best practice for governance and addressing ethical concerns, including bene t sharing, while meaningfully engaging the community can help funders focus investment by supporting good practice. While some duplication of effort should be expected, the current ecosystem of 47 registries and 21 platforms for sharing participant-level COVID-19 data that are not interoperable represents a lost opportunity and wasted resources. Given clear criteria for assessing platforms, funders, data generating groups, and the open science community can focus their efforts on a smaller number of well supported platforms and registries. Identifying the key political, ethical, administrative, regulatory, or legal motivations for the creation of disparate, non-interoperable, platforms for different diseases and datatypes is important for preventing continued investment in siloed data sharing efforts. Data sharing platforms generally have signi cant budgets because of the high cost of platform development and maintenance, retrospective data harmonization, and the governance of data sharing. All data sharing platforms were based in high income countries which raises questions of equity in the distribution of resources, concerns about the appropriate representation of the values and preferences of research teams and subjects based in LMIC, and in opportunities to build expertise in data curation and sharing. Data sharing is clearly on the policy agenda. We now need to move from fragmented, overlapping and competing data sharing efforts to a coordinated nexus of interconnected, longitudinal, participant-level data. Given the formidable barriers for such a cross-regional, cross-discipline initiative, we should start work now to be ready for the next global pandemic.

Competing interests
No competing interests declared.    Qualitative evaluation of FAIRness. a. Disease-speci c platform and registry correspondence with the FAIR criteria for data resources. b. Participant-level data types hosted by platforms and registries and correspondence with the FAIR criteria for data resources. Figure 2