The goal-based FAIRification planning method aims at defining mature FAIRification objectives through iterative steps. From our experience, we have found that starting with small steps and building on them is a more feasible approach than describing objectives from scratch. The method initially targets the most visible characteristics of the FAIRification project, such as the project domain, scope and available resources. It then leverages them to address more complex aspects such as relevant data concepts and competency questions. Finally, by following this structured and incremental approach, the method guides the stakeholders towards the definition of comprehensive objectives that encompass all relevant aspects of FAIRification.
GO-Plan is organised in six phases, namely (i) FAIRification preparation, (ii) assessment of FAIR supporting infrastructure and target resources, (iii) preparation of project stakeholders, (iv) identification of domain scope and groups of reuse stakeholders, (v) FAIRification goals refinement and alignment to target FAIR principles, and (vi) decision-making. The phases are refined in several steps and described in the sections that follow.
A distinction between two categories of stakeholders is made throughout the phases of the method: project stakeholders and reuse stakeholders. The former refers to those who are involved in the FAIRification project and have their own goals and requirements for it (e.g., data custodians, patient representative). The latter refers to those who will eventually reuse the FAIRified resource (e.g., researchers).
The method should be applied from the moment when the FAIRification project has already been idealised. For instance, when the organisation board members have already agreed on FAIRification for a certain need. At this stage, it is assumed that some aspects, such as the group of people that will be involved in the FAIRification project and the target resources, have already been defined. Moreover, the method is applicable to both post-hoc FAIRification [14], where existing resources are made FAIR, and de novo FAIRification [9], where resources are created FAIR (e.g., data made FAIR upon collection).
GO-Plan is aimed at guiding people with varying levels of experience, from beginners to experts in FAIR and in goal-oriented elicitation of objectives. However, people with distinct levels of experience can use the method differently. For instance, a beginner would follow every step of the method to assure an effective identification of FAIRification objectives. In contrast, an expert leading a FAIRification project would use the method not only for identifying and refining the FAIRification objectives, but also to communicate the aspects of FAIRification with the rest of the team. Similarly, researchers, newcomers and educators can use the method as a knowledge source.
The following subsections describe the GO-Plan method using a running example of a research organisation that collects data about patients with rare diseases. This organisation has two aims: (i) to make legacy data FAIR (i.e. posthoc FAIRification), and to implement an Electronic Data Capture System (EDC) that already creates FAIR data at the point of collection (i.e. de novo FAIRification). In addition to budget and deadline, the most important requirement for this project is the protection of patient privacy through controlled access to the data. The organisation wants to publish non-sensitive data and metadata to foster research on rare diseases.
3.1 Phase 1: FAIRification preparation
As shown in Fig. 1, the method initiates with preparation tasks that entail examining the FAIRification project idealisation documents (e.g., grant proposals, kick-off slides, meeting minutes) and/or holding meetings with related stakeholders (e.g., managers, IT personnel) to identify artefacts that will support subsequent phases. The artefacts produced in this phase are described and exemplified in Table 1.
To illustrate, an analysis of the grant application for the rare diseases registry project is conducted to identify relevant stakeholders and to determine the goals and requirements of the project, as exemplified in Table 1. In addition, conducting interviews with project leaders, patient representatives, and researchers can help to identify additional goals and requirements, as well as to identify what resources need to be made FAIR (i.e., legacy patient data and the EDC system). The organisation’s information technology (IT) team, together with a FAIR expert, can assist in understanding the existing infrastructure (e.g., storage server for data and metadata, long term longevity plan for metadata) and determining the necessary adaptations required to accommodate the resource to be made FAIR (e.g., changes on the data storage format of the EDC system).
3.2 Phase 2: Assessment of FAIRification infrastructure
This phase assesses the resources to be made FAIR and the organisation’s currently available FAIR supporting infrastructure. As shown in Fig. 2, the resources to be made FAIR are accessed (step 2a) and further examined (2b) to check if they can be retrieved (e.g., are they in a SQL server hosted locally? In a USB stick at the researcher’s home office? Can the current EDC system be modified to generate ontologised data?) and understood (e.g., are the headers of CSV files documented? Are the data elements collected by the current EDC system clear enough?).
Table 1
List of artefacts produced in Phase 1.
Artefact
|
Description
|
Phase
|
Examples
|
Groupof
stakeholders
|
List of people actively contributing to the FAIRification project
|
3, 6
|
Domain expert (clinician, researcher), FAIR project expert, semantic modeller
|
Project related goals
|
Goals that can impact or be affected by the FAIRification, which can be extracted from the goals of a larger project that includes FAIRification
|
4, 6
|
“Collect patient data to foster research with Rare Diseases”
|
Projectrequirements
|
Project requirements that will constrain the FAIRification
|
4, 6
|
Budget, deadline, data privacy constraints, interoperability requirements (e.g., must interoperate with the EJP RD Metadata Model [8])
|
List of resources to be made FAIR
|
Pre-existing resources that will be made FAIR or modified to generate FAIR data during the FAIRification project
|
2
|
Legacy data, data collection systems, software, ontology catalogues
|
FAIR supporting in-
frastructure
|
Pre-existing infrastructure that has been allocated to accommodate the FAIR
resource
|
2, 6
|
Current storage servers, access control systems, longevity plans
|
Then, the infrastructure that will accommodate the FAIR resource needs to be reached and accessed (2d) and then analysed (2e) to check if it can be used. The type of infrastructure may vary depending on the type of FAIR resource it is intended to support. For example, to make data FAIR, the infrastructure may include storage servers for data and metadata. In the case of privacy-sensitive data, an access control system must be incorporated. Similarly, to make an ontology FAIR, the infrastructure may involve an ontology repository and a metadata server. In the case of software, it can include a software code repository and a version control system. Additional infrastructure may need to be arranged to achieve the objectives identified at later phases.
The primary aim of this phase is to ensure that both the resources to be made FAIR and the current infrastructure intended to accommodate the FAIR resource do not pose any obstacles to FAIRification. This involves verifying, for instance, the availability and capability of storage servers to handle the data volume associated with FAIRification, among other considerations. If any issues are identified in this phase, they must be addressed before continuing to the next phase (steps 2c and 2f).
3.3 Phase 3: Preparation of FAIRification stakeholders
The third phase of the method focuses on identifying and preparing the people who will be involved in the FAIRification project. For this, the list of the initial group of related stakeholders is used. The main aim of this task is to bridge the knowledge gap between domain and FAIR experts to prepare them for subsequent phases. The motivation for this comes from the work of Neuhaus & Hastings [15], who suggests techniques to involve stakeholders in the ontology development process. By engaging the groups of stakeholders in each other’s domain, we reuse the authors’ proposed techniques of “creating micro-level consensus” (micro-level: project scope), which is expected to establish a more inclusive participatory environment for the discussion of objectives.
In this phase, the group of stakeholders is identified (3a) and categorised into FAIR experts and domain experts (3b). Then, relevant knowledge gaps between them are assessed to an extent that allows for a minimal and sufficient understanding of each other’s expertise (3c). This will create a common “ground language” for stakeholders to communicate their own goals and requirements.
To exemplify, FAIR experts involved in our example project (i.e., rare disease registry FAIRification) could have a question-and-answer session with domain experts about common data elements for rare disease registration [7]. Meanwhile, domain experts get a short lecture on the basics about the FAIR principles and what can be expected and done with FAIR data. We outline that, for the sake of expectation management, it is important to inform domain experts about what is possible with FAIR and what should not be expected as output from a FAIRification project. For instance, while FAIR data may facilitate it, a data visualization dashboard is a unusual output of FAIRification.
3.4 Phase 4: Identification of domain scope and groups of reuse stakeholders
Phase 4 relies on the premise that reuse is the ultimate aim of FAIR, and therefore the FAIRification objectives must consider eventual reuse case scenarios. As shown in Fig. 3, the list of the project goals and requirements are input in this phase to identify and describe the domain scope (4a). For instance, rare diseases are the domain of the rare disease registry FAIRification project, while the scope refers to a subset of the domain that considers only the terms of interest for the FAIRification project (e.g., information from patients with rare diseases including treatment procedures may be within the scope, while other medical information unrelated to the rare disease might be out of the scope).
This phase also consists of identifying semantic types pertaining to the scope (4b). We refer to semantic types as groups of concepts of similar meaning (e.g., pain is a semantic type group that covers similar concepts such as discomfort, ache, and soreness). In our running example, semantic types would include patient, treatment, diagnosis and genetic information. These would also be useful in later stages of FAIRification (i.e., conceptual modelling of (meta)data). Next, on step 4c, the semantic types and their definitions need to be discussed and agreed upon by the group of domain experts. During the agreement process, they may identify additional semantic types to be added to the list.
In step 4d, the description of the domain and semantic types is used to identify reuse stakeholders. To illustrate, a researcher and a healthcare provider are examples of stakeholders who will reuse patient, diagnosis and treatment data from the rare disease patient registry. Next, the expected goals of the reuse stakeholders when reusing the FAIR resource are predicted by the FAIR project stakeholders (4e). For instance, using the data to “identify cohorts for clinical trials” may be a goal of the researcher towards the rare disease patient registry. Other examples of reuse stakeholders can be patient representatives, clinicians and healthcare providers. The list of reuse stakeholders and their goals should also be validated with domain experts (4f ).
Note that, in step 4d, it should not be expected a fully comprehensive list of stakeholders, as it would be very difficult to predict all eventual reuse cases. However, the FAIR project stakeholders should strive for creating a list that considers relevant expected cases. In our real-world experience (cf. Section 4), we observed that preparing the resource for possible reuse scenarios has a significant impact on the outcome of FAIRification. We also point out that later project extensions to incorporate more reuse cases should be technically feasible given the flexibility of FAIR resources.
3.5 Phase 5: FAIRification goals refinement and alignment to FAIR principles
As depicted in Fig. 4, the fifth phase of the method starts by reusing the list of semantic types defined in the previous phase to identify competency questions (CQs) [10] that should be answered by the FAIR resource (5a), including the metadata about the resource. In the context of a FAIRification project, a CQ should be a question that cannot be answered without the FAIR resource or that can be answered in a significantly easier manner. We suggest that CQs elicited in this step should be complex enough to connect and explore the relationship between different semantic types. Table 2 shows some examples of CQs that can be defined for the semantic types exemplified in Section 3.4. In step 5b, the CQs are assigned to related stakeholders (i.e., reuse stakeholders and relevant project stakeholders) and further refined as objectives (5c). These objectives can be identified by asking why a certain CQ needs to be answered and how it can be answered. Some objectives are also exemplified in Table 2.
The objectives identified from the CQs are then aligned with related principles (5d). For this step, it should be identified which and how a FAIR principle will support achieving a specific objective. For instance, the objective “public awareness of rare diseases is improved” (Fig. 5), which is further refined until it can be realised by the task “collect and publish demographic statistics”, may be supported by F2 (rich metadata to make the patient registry findable) and R1.1 (data licence to allow reuse of the data for demographic statistics). Meanwhile, other principles (e.g., F1) may not be prioritised for this specific objective. This reflects the “FAIR enough” aspect of GO-Plan.
Table 2
Examples of competency questions (CQs), related stakeholders and their objectives.
CQ
|
Stakeholder
|
Objectives
|
What is the age range and gender distribution of patients with a particular rare disease in Europe?
|
Patient Representative
Health Care Provider
|
“Public awareness of Rare Diseases is improved” “Patient management is improved”
|
What previous diagnoses and treatments have been tried for patients with a particular rare disease?
|
Researcher
|
“Cohorts for clinical trials are identified”, “Disease progression is predicted”
|
To facilitate the management of objectives, we suggest the use of goalmodelling techniques such as iStar [4], which helps to capture the stakeholders intentions and their relationships in a structured way. Models created with iStar include concepts such as actors, goals, tasks, resources, and relationships such as decomposition and contribution links. The reader is referred to [4] for further information on the iStar syntax.
The final step of this phase consists of using the list of semantic types to identify related FAIRification projects (5e) through, for instance, the use of FAIR Implementation Profiles (FIPs) [24].FIPs are specifications of implementation solutions for realising the FAIR principles in a specific context or domain, and their use is intended to foster convergence on FAIR implementation decisions [24]. In the context of GO-Plan, related projects can support collecting implementation solutions that can be reused in the FAIRification project. The EJP RD project [8] is such a project to our running example.
3.6 Phase 6: Decision making
The sixth and last phase of the method starts by prioritising feasible objectives (6a) given the project requirements (e.g., data privacy) and constraints (e.g., budget, deadline, available expertise). At this point, prioritisation also includes removing objectives that may not be supported by FAIR principles or are not related to FAIRification. Then, the prioritised objectives are further refined (6b) and tasks required to realise them are elicited. Next, the necessary (meta)data for achieving the identified tasks are listed (6c) and described in the goal diagrams as resources, as exemplified in Fig. 5. Finally, the most appropriate solutions for prioritised objectives are identified and selected considering the project goals and requirements (e.g., time and budget), and the limitations of available supporting infrastructure and expertise (6d). This step can be supported by reusing solutions from the similar projects identified in step 5e, by consulting experts or by querying resources such as FAIRSharing [21] and the Smart Guidance RD Wizard [5]. Subsequently, the required expertise for the implementation of the selected solutions (6e) is defined. To illustrate, the reuse of the EJP RD Metadata Model is a possible implementation choice for the objectives depicted in Fig. 5 (in the context of F2 – “Find demographic data about patients”) given the project requirements, and a semantic modelling expert would be a required expertise to support reusing this solution.
At this point, the goal diagram should contain enough information to inform and guide FAIRification. The FAIRification objectives, tasks and chosen implementation solutions can now be seen as actions to be taken towards realising FAIRification. It is upon the experts conducting the FAIR project to prioritise tasks and define implementation cycles and evaluation cases. We suggest using a FAIRification workflow to organise the FAIRification process that follows.