A goal-oriented method for FAIRification planning

doi:10.21203/rs.3.rs-3092538/v1

Download PDF

Method Article

A goal-oriented method for FAIRification planning

https://doi.org/10.21203/rs.3.rs-3092538/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The FAIR Principles provide guidance on how to improve the findability, accessibility, interoperability, and reusability of digital resources. Since the publication of the principles in 2016, several workflows have been proposed to support the process of making data FAIR (FAIRification). However, to respect the uniqueness of different communities, both the principles and the available workflows have been deliberately designed to remain agnostic in terms of standards, tools, and related implementation choices. Consequently, FAIRification needs to be properly planned in advance, and implementation details must be discussed with stakeholders and aligned with FAIRification objectives. To support this, we describe GO-Plan, a method for identifying and refining FAIRification objectives. Leveraging on best practices and techniques from requirements and ontology engineering, the method aims at incrementally elaborating the most obvious aspects of the domain (e.g. the initial set of elements to be collected) into complex and comprehensive objectives. Experience has demonstrated that the definition of clear objectives enables stakeholders to communicate effectively and make informed implementation decisions, such as defining achievement criteria for distinct principles and identifying relevant metadata to be collected. This paper describes the GO-Plan method and reports on a real-world application in the development of a FAIR ontology catalogue.

Software Engineering

FAIR

FAIRification

FAIRification objectives

The vast amount of data generated every day is only valuable if it can be properly interpreted and reused. However, it is humanly unfeasible to manually merge and make sense of all the information that is currently available, therefore the support of machines is required. Although machines can automatically analyse and interpret data to efficiently find useful information, they still require time-consuming human support to prepare and merge data. To address this, the FAIR principles have been proposed to guide the transformation and production of resources that are findable, accessible, interoperable and reusable by humans and machines [28]. FAIR resources can be easily managed by machines with minimal human intervention, thus reducing human workload.

The four letters of FAIR are further decomposed into 15 principles [28]. Findability is enforced by using globally unique and persistent identifiers to refer to data and metadata (F1), describing data with rich metadata (F2), explicitly associating metadata with data (F3), and indexing metadata in searchable resources (F4). Accessibility is achieved by using standardised, open communication protocols for data exchange (A1, A1.1) that allow access authorisation procedures (A1.2) while ensuring the longevity of metadata (A2). Interoperability is enhanced by publishing metadata and data in broadly applicable knowledge representation languages (I1), reusing vocabularies that also follow the FAIR principles (I2), and including qualified references to other metadata and data (I3). Finally, reusability is facilitated by describing metadata and data with accurate and relevant attributes (R1), including usage licences (R1.1), detailed provenance (R1.2) and using domain-relevant community standards (R1.3).

Data that is made FAIR (FAIRified data) has significant value in many areas. One such area is rare diseases, where projects such as the European Joint Programme on Rare Diseases (EJP RD) [6] interoperate FAIR data and metadata from different institutions for the benefit of rare disease research. Without FAIR, this inherently siloed and dispersed knowledge would be of reduced value, as it would not be large enough to answer research questions on its own.

The FAIR principles have been designed to provide a set of expected behaviours from the data and services ecosystem. Additionally, the FAIRification workflows define the steps to be taken to make resources FAIR (‘FAIRification’). Nonetheless, neither the FAIR principles nor the FAIRification workflow mandate the use of any specific standard, format or software. This is because FAIR and FAIRification have been made agnostic to respect the unique requirements and needs that different communities face when managing and sharing data. Therefore, FAIR can be implemented in different manners and at different levels. However, this flexibility requires careful guidance throughout the FAIRification process to ensure that the implementation decisions (e.g., standards, metadata) align with the FAIRification objectives. In fact, the identification of FAIRification objectives is the initial and crucial step of several FAIRification workflows [23, 9, 14].

The problem of identifying objectives and requirements has been studied by many research communities. Among them is requirements engineering, a community dedicated to studying the identification, refinement, and management of software requirements. The requirements engineering literature informs that the lack of proper planning and refinement of objectives and requirements has a significant impact on the software development process. For instance, Pressman [18] points out that changing requirements after the software product has been delivered can cost up to 60 to 100 times more than changing a requirement during the software planning phase. We hypothesise that inadequate identification of FAIRification objectives may have a similar impact on planning and executing a FAIRification process. However, there is a lack of research on methods specifically focused on identifying and refining FAIRification objectives for effective FAIRification planning. Furthermore, a recent study on FAIRification challenges concluded that clarifying objectives prior to implementation is a key step in FAIRification, as it helps the team to make informed decisions that are consistent with their objectives [22].

To address the aforementioned gap we developed GO-Plan (Goal-Oriented FAIRification Planning), a method to plan FAIRification through the identification and refinement of FAIRification objectives. The method reflects our understanding that distinct objectives can have different impacts on the planning and execution of FAIRification. Consequently, resources should be made FAIR at a level that aligns with the specific objectives of the FAIRification project. That is, resources should be made “FAIR enough” to fulfil the objectives of the involved stakeholders. Thus, the FAIRification planning should not only focus on the selection of suitable technologies or standards, but also on prioritising the effort required to raise the FAIR level of the targeted resources. Moreover, as FAIRification is a community-driven, aspirational and incremental process [28], these objectives must encompass the perspectives of both the stakeholders directly participating in the project and relevant external stakeholders (i.e., those who will eventually reuse the resource). As such, each effort undertaken to make a resource FAIR (or more FAIR—FAIRer) for one’s own objectives will also make that resource FAIRer for others.

GO-Plan was designed based on good practices from requirements engineering (e.g., [26]) while embedding our experiences from FAIRification projects, including training on FAIR (e.g., [3]), and conducting FAIRification within single (e.g., [19]) and among multiple institutions (e.g., [22]). The method focuses on aligning implementation choices—including metadata and data elements to be collected—with the objectives of relevant stakeholders.

We discuss related works in Section 2. Then, we describe GO-Plan and illustrate it with a fictitious running example in Section 3. In Section 4, we report on a real-world application of the method. Finally, Section 5 discusses the strengths and weaknesses of our proposal, as well as implications for future research. In the remainder of this paper, we use the spelling “(meta)data” to refer to both data and metadata. The words “goal” and “objectives” are used as synonyms. Note that the literature on FAIRification workflows usually uses the word “objective”, while the requirements engineering literature usually uses the word “goal”.

Several workflows and frameworks have been proposed to support FAIRification in different ways [23]. The generic [14] and the de novo [9] FAIRification workflows define the steps to be followed in the FAIRification of different types of FAIR resources, and both describe the identification of FAIRification objectives as the first step of FAIRification. However, they do not provide detailed guidance on defining these objectives.

Similarly, the FAIRplus FAIRification framework [27] defines steps to be followed during FAIRification and a work plan layout to support organising the FAIR implementation work. The first phase of this framework consists of setting “realistic and practical goals” [27]. In this phase, useful recommendations and examples are provided with focus on defining an acceptable “FAIR enough” state for the resource to be made FAIR. A valuable recommendation given by FAIRplus is to avoid “the word ‘FAIR’ and its derivatives in goals entirely as it is too general to impart clear meaning” [27]. Despite the useful recommendations, the FAIRplus framework does not structure the definition and refinement of FAIRification objectives or other aspects related to FAIRification planning.

While other FAIRification workflows may also define a step for identifying FAIRification objectives [23], to the best of our knowledge, none of them has provided detailed guidance on defining FAIRification objectives or other FAIRification planning related aspects, such as distinguishing between the different types of stakeholders involved in FAIRification projects.

The goal-based FAIRification planning method aims at defining mature FAIRification objectives through iterative steps. From our experience, we have found that starting with small steps and building on them is a more feasible approach than describing objectives from scratch. The method initially targets the most visible characteristics of the FAIRification project, such as the project domain, scope and available resources. It then leverages them to address more complex aspects such as relevant data concepts and competency questions. Finally, by following this structured and incremental approach, the method guides the stakeholders towards the definition of comprehensive objectives that encompass all relevant aspects of FAIRification.

GO-Plan is organised in six phases, namely (i) FAIRification preparation, (ii) assessment of FAIR supporting infrastructure and target resources, (iii) preparation of project stakeholders, (iv) identification of domain scope and groups of reuse stakeholders, (v) FAIRification goals refinement and alignment to target FAIR principles, and (vi) decision-making. The phases are refined in several steps and described in the sections that follow.

A distinction between two categories of stakeholders is made throughout the phases of the method: project stakeholders and reuse stakeholders. The former refers to those who are involved in the FAIRification project and have their own goals and requirements for it (e.g., data custodians, patient representative). The latter refers to those who will eventually reuse the FAIRified resource (e.g., researchers).

The method should be applied from the moment when the FAIRification project has already been idealised. For instance, when the organisation board members have already agreed on FAIRification for a certain need. At this stage, it is assumed that some aspects, such as the group of people that will be involved in the FAIRification project and the target resources, have already been defined. Moreover, the method is applicable to both post-hoc FAIRification [14], where existing resources are made FAIR, and de novo FAIRification [9], where resources are created FAIR (e.g., data made FAIR upon collection).

GO-Plan is aimed at guiding people with varying levels of experience, from beginners to experts in FAIR and in goal-oriented elicitation of objectives. However, people with distinct levels of experience can use the method differently. For instance, a beginner would follow every step of the method to assure an effective identification of FAIRification objectives. In contrast, an expert leading a FAIRification project would use the method not only for identifying and refining the FAIRification objectives, but also to communicate the aspects of FAIRification with the rest of the team. Similarly, researchers, newcomers and educators can use the method as a knowledge source.

The following subsections describe the GO-Plan method using a running example of a research organisation that collects data about patients with rare diseases. This organisation has two aims: (i) to make legacy data FAIR (i.e. posthoc FAIRification), and to implement an Electronic Data Capture System (EDC) that already creates FAIR data at the point of collection (i.e. de novo FAIRification). In addition to budget and deadline, the most important requirement for this project is the protection of patient privacy through controlled access to the data. The organisation wants to publish non-sensitive data and metadata to foster research on rare diseases.

3.1 Phase 1: FAIRification preparation

As shown in Fig. 1, the method initiates with preparation tasks that entail examining the FAIRification project idealisation documents (e.g., grant proposals, kick-off slides, meeting minutes) and/or holding meetings with related stakeholders (e.g., managers, IT personnel) to identify artefacts that will support subsequent phases. The artefacts produced in this phase are described and exemplified in Table 1.

To illustrate, an analysis of the grant application for the rare diseases registry project is conducted to identify relevant stakeholders and to determine the goals and requirements of the project, as exemplified in Table 1. In addition, conducting interviews with project leaders, patient representatives, and researchers can help to identify additional goals and requirements, as well as to identify what resources need to be made FAIR (i.e., legacy patient data and the EDC system). The organisation’s information technology (IT) team, together with a FAIR expert, can assist in understanding the existing infrastructure (e.g., storage server for data and metadata, long term longevity plan for metadata) and determining the necessary adaptations required to accommodate the resource to be made FAIR (e.g., changes on the data storage format of the EDC system).

3.2 Phase 2: Assessment of FAIRification infrastructure

This phase assesses the resources to be made FAIR and the organisation’s currently available FAIR supporting infrastructure. As shown in Fig. 2, the resources to be made FAIR are accessed (step 2a) and further examined (2b) to check if they can be retrieved (e.g., are they in a SQL server hosted locally? In a USB stick at the researcher’s home office? Can the current EDC system be modified to generate ontologised data?) and understood (e.g., are the headers of CSV files documented? Are the data elements collected by the current EDC system clear enough?).

Table 1

List of artefacts produced in Phase 1.
Artefact	Description	Phase	Examples
Groupof stakeholders	List of people actively contributing to the FAIRification project	3, 6	Domain expert (clinician, researcher), FAIR project expert, semantic modeller
Project related goals	Goals that can impact or be affected by the FAIRification, which can be extracted from the goals of a larger project that includes FAIRification	4, 6	“Collect patient data to foster research with Rare Diseases”
Projectrequirements	Project requirements that will constrain the FAIRification	4, 6	Budget, deadline, data privacy constraints, interoperability requirements (e.g., must interoperate with the EJP RD Metadata Model [8])
List of resources to be made FAIR	Pre-existing resources that will be made FAIR or modified to generate FAIR data during the FAIRification project	2	Legacy data, data collection systems, software, ontology catalogues
FAIR supporting in- frastructure	Pre-existing infrastructure that has been allocated to accommodate the FAIR resource	2, 6	Current storage servers, access control systems, longevity plans

Then, the infrastructure that will accommodate the FAIR resource needs to be reached and accessed (2d) and then analysed (2e) to check if it can be used. The type of infrastructure may vary depending on the type of FAIR resource it is intended to support. For example, to make data FAIR, the infrastructure may include storage servers for data and metadata. In the case of privacy-sensitive data, an access control system must be incorporated. Similarly, to make an ontology FAIR, the infrastructure may involve an ontology repository and a metadata server. In the case of software, it can include a software code repository and a version control system. Additional infrastructure may need to be arranged to achieve the objectives identified at later phases.

The primary aim of this phase is to ensure that both the resources to be made FAIR and the current infrastructure intended to accommodate the FAIR resource do not pose any obstacles to FAIRification. This involves verifying, for instance, the availability and capability of storage servers to handle the data volume associated with FAIRification, among other considerations. If any issues are identified in this phase, they must be addressed before continuing to the next phase (steps 2c and 2f).

3.3 Phase 3: Preparation of FAIRification stakeholders

The third phase of the method focuses on identifying and preparing the people who will be involved in the FAIRification project. For this, the list of the initial group of related stakeholders is used. The main aim of this task is to bridge the knowledge gap between domain and FAIR experts to prepare them for subsequent phases. The motivation for this comes from the work of Neuhaus & Hastings [15], who suggests techniques to involve stakeholders in the ontology development process. By engaging the groups of stakeholders in each other’s domain, we reuse the authors’ proposed techniques of “creating micro-level consensus” (micro-level: project scope), which is expected to establish a more inclusive participatory environment for the discussion of objectives.

In this phase, the group of stakeholders is identified (3a) and categorised into FAIR experts and domain experts (3b). Then, relevant knowledge gaps between them are assessed to an extent that allows for a minimal and sufficient understanding of each other’s expertise (3c). This will create a common “ground language” for stakeholders to communicate their own goals and requirements.

To exemplify, FAIR experts involved in our example project (i.e., rare disease registry FAIRification) could have a question-and-answer session with domain experts about common data elements for rare disease registration [7]. Meanwhile, domain experts get a short lecture on the basics about the FAIR principles and what can be expected and done with FAIR data. We outline that, for the sake of expectation management, it is important to inform domain experts about what is possible with FAIR and what should not be expected as output from a FAIRification project. For instance, while FAIR data may facilitate it, a data visualization dashboard is a unusual output of FAIRification.

3.4 Phase 4: Identification of domain scope and groups of reuse stakeholders

Phase 4 relies on the premise that reuse is the ultimate aim of FAIR, and therefore the FAIRification objectives must consider eventual reuse case scenarios. As shown in Fig. 3, the list of the project goals and requirements are input in this phase to identify and describe the domain scope (4a). For instance, rare diseases are the domain of the rare disease registry FAIRification project, while the scope refers to a subset of the domain that considers only the terms of interest for the FAIRification project (e.g., information from patients with rare diseases including treatment procedures may be within the scope, while other medical information unrelated to the rare disease might be out of the scope).

This phase also consists of identifying semantic types pertaining to the scope (4b). We refer to semantic types as groups of concepts of similar meaning (e.g., pain is a semantic type group that covers similar concepts such as discomfort, ache, and soreness). In our running example, semantic types would include patient, treatment, diagnosis and genetic information. These would also be useful in later stages of FAIRification (i.e., conceptual modelling of (meta)data). Next, on step 4c, the semantic types and their definitions need to be discussed and agreed upon by the group of domain experts. During the agreement process, they may identify additional semantic types to be added to the list.

In step 4d, the description of the domain and semantic types is used to identify reuse stakeholders. To illustrate, a researcher and a healthcare provider are examples of stakeholders who will reuse patient, diagnosis and treatment data from the rare disease patient registry. Next, the expected goals of the reuse stakeholders when reusing the FAIR resource are predicted by the FAIR project stakeholders (4e). For instance, using the data to “identify cohorts for clinical trials” may be a goal of the researcher towards the rare disease patient registry. Other examples of reuse stakeholders can be patient representatives, clinicians and healthcare providers. The list of reuse stakeholders and their goals should also be validated with domain experts (4f ).

Note that, in step 4d, it should not be expected a fully comprehensive list of stakeholders, as it would be very difficult to predict all eventual reuse cases. However, the FAIR project stakeholders should strive for creating a list that considers relevant expected cases. In our real-world experience (cf. Section 4), we observed that preparing the resource for possible reuse scenarios has a significant impact on the outcome of FAIRification. We also point out that later project extensions to incorporate more reuse cases should be technically feasible given the flexibility of FAIR resources.

3.5 Phase 5: FAIRification goals refinement and alignment to FAIR principles

As depicted in Fig. 4, the fifth phase of the method starts by reusing the list of semantic types defined in the previous phase to identify competency questions (CQs) [10] that should be answered by the FAIR resource (5a), including the metadata about the resource. In the context of a FAIRification project, a CQ should be a question that cannot be answered without the FAIR resource or that can be answered in a significantly easier manner. We suggest that CQs elicited in this step should be complex enough to connect and explore the relationship between different semantic types. Table 2 shows some examples of CQs that can be defined for the semantic types exemplified in Section 3.4. In step 5b, the CQs are assigned to related stakeholders (i.e., reuse stakeholders and relevant project stakeholders) and further refined as objectives (5c). These objectives can be identified by asking why a certain CQ needs to be answered and how it can be answered. Some objectives are also exemplified in Table 2.

The objectives identified from the CQs are then aligned with related principles (5d). For this step, it should be identified which and how a FAIR principle will support achieving a specific objective. For instance, the objective “public awareness of rare diseases is improved” (Fig. 5), which is further refined until it can be realised by the task “collect and publish demographic statistics”, may be supported by F2 (rich metadata to make the patient registry findable) and R1.1 (data licence to allow reuse of the data for demographic statistics). Meanwhile, other principles (e.g., F1) may not be prioritised for this specific objective. This reflects the “FAIR enough” aspect of GO-Plan.

Table 2

Examples of competency questions (CQs), related stakeholders and their objectives.
CQ	Stakeholder	Objectives
What is the age range and gender distribution of patients with a particular rare disease in Europe?	Patient Representative Health Care Provider	“Public awareness of Rare Diseases is improved” “Patient management is improved”
What previous diagnoses and treatments have been tried for patients with a particular rare disease?	Researcher	“Cohorts for clinical trials are identified”, “Disease progression is predicted”

To facilitate the management of objectives, we suggest the use of goalmodelling techniques such as iStar [4], which helps to capture the stakeholders intentions and their relationships in a structured way. Models created with iStar include concepts such as actors, goals, tasks, resources, and relationships such as decomposition and contribution links. The reader is referred to [4] for further information on the iStar syntax.

The final step of this phase consists of using the list of semantic types to identify related FAIRification projects (5e) through, for instance, the use of FAIR Implementation Profiles (FIPs) [24].FIPs are specifications of implementation solutions for realising the FAIR principles in a specific context or domain, and their use is intended to foster convergence on FAIR implementation decisions [24]. In the context of GO-Plan, related projects can support collecting implementation solutions that can be reused in the FAIRification project. The EJP RD project [8] is such a project to our running example.

3.6 Phase 6: Decision making

The sixth and last phase of the method starts by prioritising feasible objectives (6a) given the project requirements (e.g., data privacy) and constraints (e.g., budget, deadline, available expertise). At this point, prioritisation also includes removing objectives that may not be supported by FAIR principles or are not related to FAIRification. Then, the prioritised objectives are further refined (6b) and tasks required to realise them are elicited. Next, the necessary (meta)data for achieving the identified tasks are listed (6c) and described in the goal diagrams as resources, as exemplified in Fig. 5. Finally, the most appropriate solutions for prioritised objectives are identified and selected considering the project goals and requirements (e.g., time and budget), and the limitations of available supporting infrastructure and expertise (6d). This step can be supported by reusing solutions from the similar projects identified in step 5e, by consulting experts or by querying resources such as FAIRSharing [21] and the Smart Guidance RD Wizard [5]. Subsequently, the required expertise for the implementation of the selected solutions (6e) is defined. To illustrate, the reuse of the EJP RD Metadata Model is a possible implementation choice for the objectives depicted in Fig. 5 (in the context of F2 – “Find demographic data about patients”) given the project requirements, and a semantic modelling expert would be a required expertise to support reusing this solution.

At this point, the goal diagram should contain enough information to inform and guide FAIRification. The FAIRification objectives, tasks and chosen implementation solutions can now be seen as actions to be taken towards realising FAIRification. It is upon the experts conducting the FAIR project to prioritise tasks and define implementation cycles and evaluation cases. We suggest using a FAIRification workflow to organise the FAIRification process that follows.

Our method has been used to improve the FAIRness of a catalogue for ontologydriven conceptual modelling research, henceforth the OntoUML catalogue [2, 20]. It contains a growing set of conceptual models defined either using the OntoUML modelling language [12] or by extending the Unified Foundational Ontology (UFO) [11]. The OntoUML catalogue was initially built using an ad hoc FAIRification workflow, as reported in [2]. Later, the FAIR aspects of the catalogue were reviewed using the method presented in this paper. Due to space limitations, we do not report on all aspects of the project, but a full description of the use of the method is presented in [20].

When employing the method, the FAIRification team could easily navigate through phases 1 to 3 because the FAIRification project of the OntoUML catalogue had already been idealised and previously executed (i.e., [2]). Consequently, most decisions had already been sufficiently discussed. In phase 1, the team elicited the needed artefacts:

Initial objective: to provide knowledge and to get insights on the use of OntoUML language constructs, and recurring patterns and anti-patterns in models across different domains [2].
Group of project stakeholders: a FAIR expert, a FAIR Data Point Expert, semantic modellers, UFO/OntoUML experts and catalogue managers. – Resources to be made FAIR: the models (meta)data, the catalogue itself and its controlled vocabulary.
FAIR supporting infrastructure: FAIR Data Point (FDP) [25], GitHub repository, PURL ¹ and W3ID ².
Project requirements: submissions of new models should be easy, and the catalogue requires low maintenance cost and effort.

In phase 2, no issues were found regarding the supporting infrastructure and resources to be made FAIR. This is because most of the existing infrastructure (e.g. the FDP) is FAIR compliant, and the concepts of OntoUML and UFO (the domain of the catalog) are clearly and unambiguously defined. In addition, Phase 3 was not necessary as all involved stakeholders already have sufficient knowledge about FAIR and the domain (i.e., OntoUML and UFO).

The scope of the FAIRification project was delimited by the OntoUML metamodel, which also defined the semantic types of interest (e.g. class, relation, generalisation). This facilitated the execution of phase 4.

The team decided to not extract competency questions from semantic types on phase 5 as they already had an initial set of FAIRification objectives defined. Based on their objectives, the team determined that only the catalogue manager, as a project stakeholder, would be included in the set of goal diagrams alongside the reuse stakeholders. This decision was based on the fact that the catalogue manager’s objectives were deemed to be the most pertinent to the project and appeared to encompass the additional objectives of the entire team. For reuse stakeholders, the team identified the ones described in Table 3, and defined their goals based on the catalogue manager’s objectives. For instance, “Domain model become a community reference” is a goal related to the catalogue manager’s objective “model reuse maximised”.

Finally, the team aligned the objectives to related FAIR principles (Fig. 6) and defined implementation solutions for each principle in the context of the referred objectives, as reported in [20]. Although the solutions were not reused from similar projects, the team consulted with experts to identify the most appropriate ones (e.g., using DCAT [1] for defining metadata). A complete description of all objectives pertaining to each stakeholder is available in the project’s documentation.³

Table 3

Reuse stakeholders identified in the OntoUML catalogue FAIRification project and the objectives identified for each them.
Reuse Stakeholder	Initial objectives
Newcomer	“Proficiency in OntoUML increased”
Modeller	“Have domain of interest defined by a model”, “Domain model become a community reference”, “Domain model reuse maximised”
Tool Developer	“Algorithm developed”, “Have algorithm evaluation reproducible”
Researcher	“OntoUML language improved”, “OntoUML be used appropriately’

[1] https://purl.archive.org/

[2] https://w3id.org/

[3] https://purl.org/ontouml-models

The method presented in this paper is defined with sequential phases and steps. However, we have observed that real-world applications, such as the one described in Section 4, may benefit from an iterative approach. For instance, in a first iteration, the process of creating the competency question can raise the need to include more semantic types, which can be addressed in a second iteration. It is up to the FAIRification team to decide how many iterations should be performed considering the project constraints (especially budget and time).

Additionally, distinct FAIRification iterations can be tailored to address the specific needs and considerations of different stakeholders, thereby defining different levels of FAIR and related aspects for them. That is particularly valuable, for instance, when dealing with sensitive data (e.g. some types of users have access to different portions of data) or with FAIRification projects involving non-public data (e.g. from private companies), where certain reuse stakeholders might have limited access to (meta)data.

We acknowledge the need for a more detailed evaluation of the expected benefits of the method when compared to ad hoc FAIRification. We are currently working on evaluating GO-Plan from two perspectives: usability and quality of the output. On usability, we are studying the perception of users when using the method (i.e., “is it easier to define FAIRification objectives using GO-Plan compared to ad hoc FAIRification planning?”). With respect to the quality of the output, we assess whether users produce a FAIR resource using the method that is more in line with the FAIRification objectives compared to resources produced after ad hoc FAIRification planning. For instance, we expect that users will be able to make more informed implementation decisions based on a clearer understanding of the FAIRification objectives. In addition, we emphasise that the method is based on techniques from software engineering (e.g., goal modelling [26]) and ontology engineering (e.g., competency questions [10] and microlevel consensus [15]), which have already been evaluated and used in several real-world applications (e.g., [17, 13]).

When applying the method to a real-world use case, we observed a significant influence of defining reuse stakeholders in the results of FAIRification, particularly when identifying which (meta)data concepts should be collected and published, as well as considerations regarding licensing and provenance. We attribute this impact to the fundamental emphasis of FAIR on facilitating reusability and assert that optimising the resource for reuse cases is key to effective FAIRification. Furthermore, we also observed that using goal model diagrams has facilitated the communication among stakeholders.

When comparing the real-world use case with [20] and without [2] the use of our method, we noticed that our approach led to more informed and clearer decision-making and evaluation of the FAIRness of the catalogue. The stakeholders were able prioritise solutions based on a comprehensive understanding of the relationship between objectives and the FAIR principles. To illustrate, the use of our method resulted in a re-definition of metadata concepts to be collected, a reprioritisation of the principles (e.g., more attention was given to R1), and the inclusion of FAIR supporting infrastructure such as the FDP. Finally, we observed that the objectives helped stakeholders in establishing achievement criteria for principles that lacked sufficient precision. For instance, the team was able to define that the metadata set satisfied the “data are described with rich metadata” (F2) principle by ensuring that it supported all prioritised goals from the reuse stakeholders.

The main aim of the work presented in this paper is to help all FAIR enthusiasts to better define clear FAIRification objectives that can lead to successful FAIRification. Nonetheless, we argue that communities should actively endeavour to share their FAIRification planning artefacts (e.g., goal diagrams, implementation decisions, FIPs) in order to promote standards convergence, disseminate solutions to implementation challenges, and share experiences so that others can prepare and execute FAIRification in a faster and more seamless way. To support this, we propose that FAIRification plans, including goals and mappings to related principles, should also be made FAIR. In addition to that, we emphasise the publication of FAIR implementation decisions (i.e. FIPs) as an effective means to gradually diminish the work for subsequent projects and (re)users. This will allow future work to focus on creating a catalogue of FAIRification plans and associated concrete tasks that can lead to improved automation.

Competing interests: The authors declare no competing interests.

Acknowledgements

We thank the LUMC Biosemantics and the EJP RD FAIRification Stewards groups for constant feedback on this research. This initiative has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N°825575 and the Trusted World of Corona (TWOC; LSH Health Holland).

Albertoni, R., Browning, D., Cox, S., et al.: The W3C Data Catalog Vocabulary, version 2: Rationale, design principles, and uptake. arXiv preprint arXiv:2303.08883 (2023)
Barcelos, P.P.F., Sales, T.P., Fumagalli, M., et al.: A FAIR model catalog for ontology-driven conceptual modeling research. In: Conceptual Modeling. ER 2022. vol. 13607, p. 3–17. Springer (2022)
Bernabé, C.H., Thielemans, L., Carta, C., et al.: Building expertise on FAIR through evolving Bring Your Own Data (BYOD) workshops: Describing the data, software, and management focused approaches and their evolution (2023), manuscript in preparation
Dalpiaz, F., Franch, X., Horkoff, J.: iStar 2.0 language guide. arXiv preprint arXiv:1605.07767 (2016)
van Damme, P., Alarcón Moreno, P., Cámara Ballesteros, A., Bernabé, C.H., Le Cornec, C.M.A., Dos Santos Vieira, B., van der Velde, K.J., Zhang, S., Carta, C., Cornet, R., ’t Hoen, P.A., Jacobsen, A., Swertz, M.A., Roos, M., Benis, N.: A resource for guiding data stewards to make european rare disease patient registries fair. Data Science Journal (2023), manuscript submitted for publication
EJP RD: European Joint Programme on Rare Diseases. https://www. ejprarediseases.org/ (2020), accessed: April 24, 2023
EU RD Platform: Set of common data elements. https://eu-rd-platform.jrc.ec. europa.eu/set-of-common-data-elements_en (accessed 2023)
European Joint Programme for Rare Diseases: EJP-RD VP Resource Metadata Schema. https://github.com/ejp-rd-vp/resource-metadata-schema (2021), accessed on April 24, 2023
Groenen, K.H., Jacobsen, A., Kersloot, M.G., dos Santos Vieira, B., van Enckevort, E., Kaliyaperumal, R., Arts, D.L., t Hoen, P.A., Cornet, R., Roos, M., et al.: The de novo FAIRification process of a registry for vascular anomalies. Orphanet Journal of Rare Diseases (2021)
Grüninger, M., Fox, M.S.: The role of competency questions in enterprise engineering. Benchmarking—Theory and practice (1995)
Guizzardi, G., Botti Benevides, A., Fonseca, C.M., et al.: UFO: Unified Foundational Ontology. Applied Ontology 17(1), 167–210 (2022)
Guizzardi, G., Fonseca, C.M., Benevides, A.B., et al.: Endurant types in ontologydriven conceptual modeling: Towards OntoUML 2.0. In: Conceptual Modeling. ER 2018. vol. 11157, p. 136–150. Springer (2018)
Horkoff, J., Aydemir, F.B., Cardoso, E., et al.: Goal-oriented requirements engineering: An extended systematic mapping study. Requirements engineering 24, 133–160 (2019)
Jacobsen, A., Kaliyaperumal, R., Bonino da Silva Santos, L.O., Mons, B., Schultes, E., Roos, M., Thompson, M.: A generic workflow for the data FAIRification process. Data Intelligence (2020)
Neuhaus, F., Hastings, J.: Ontology development is consensus creation, not (merely) representation. Applied Ontology (2022), preprint
OMG: Business Process Model and Notation (BPMN), Version 2.0 (January 2011), http://www.omg.org/spec/BPMN/2.0
Pacheco, C., García, I., Reyes, M.: Requirements elicitation techniques: A systematic literature review based on the maturity of the techniques. IET Software (2018)
Pressman, R.S.: Software engineering: A practitioner’s approach. McGraw-Hill, 7th edn. (2010)
Queralt-Rosinach, N., Kaliyaperumal, R., Bernabé, C.H., et al.: Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic. Journal of Biomedical Semantics (2022)
Sales, T.P., Barcelos, P.P.F., Fonseca, C.M., et al.: A FAIR catalog of ontologydriven conceptual models (2023), manuscript submitted to Data & Knowledge Engineering
Sansone, S.A., McQuilton, P., Rocca-Serra, P., et al.: FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology (2019)
dos Santos Vieira, B., Bernabé, C.H., Zhang, S., et al.: Towards FAIRification of sensitive and fragmented rare disease patient data: Challenges and solutions in european reference network registries. Orphanet Journal of Rare Diseases 17, 436 (2022)
dos Santos Vieira, B., Bernabé, C.H., Henriques, I., Zhang, S., Camara, A.B., García, J.A.R., van der Velde, J., van Damme, P., Moreno, P.A., Benis, N., Strubel, J., Schoots, F., L’Henaff, P., ’t Hoen, P., Roos, M., Jacobsen, A., Cornet, R., Wilkinson, M.D., Schaefer, F., Swertz, M., Jetten, M.: Critical steps towards largescale implementation of the FAIR data principles (Mar 2023), https://doi.org/10. 5281/zenodo.7867293
Schultes, E., Magagna, B., Hettne, K.M., et al.: Reusable FAIR implementation profiles as accelerators of FAIR convergence. In: Advances in Conceptual Modeling. ER 2020. vol. 12584. Springer (2020)
Bonino da Silva Santos, L.O., Burger, K., Kaliyaperumal, R., et al.: FAIR data point: A FAIR-oriented approach for metadata publication. Data Intelligence pp. 1–21 (2022)
Van Lamsweerde, A.: Goal-oriented requirements engineering: A guided tour. In: Proceedings fifth ieee international symposium on requirements engineering. pp. 249–262. IEEE (2001)
Welter, D., Juty, N., Rocca-Serra, P., Xu, F., Henderson, D., Gu, W., Strubel, J., Giessmann, R.T., Emam, I., Gadiya, Y., et al.: Fair in action-a flexible framework to guide fairification. Scientific Data 10(1), 291 (2023)
Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., et al.: The FAIR guiding principles for scientific data management and stewardship. Scientific data (2016)

Download PDF

Version 1

posted

You are reading this latest preprint version

A goal-oriented method for FAIRification planning

Status:

Version 1

Abstract

Figures

1 Introduction

2 Related works

3 The goal-based FAIRification planning method

3.1 Phase 1: FAIRification preparation

3.2 Phase 2: Assessment of FAIRification infrastructure

3.3 Phase 3: Preparation of FAIRification stakeholders

3.4 Phase 4: Identification of domain scope and groups of reuse stakeholders

3.5 Phase 5: FAIRification goals refinement and alignment to FAIR principles

3.6 Phase 6: Decision making

4 Application on a real-world case: FAIRifying the OntoUML/UFO Catalogue

5 Final remarks

Declarations

References

Status:

Version 1