Building data analytics skills for clinical drug development quality professionals - the Data Analytics University training program

Clinical drug development is a complex and extensive process that entails multiple stakeholders alongside patients, requires large capital expenditures and takes nearly a decade on average to complete. To ensure the correct development of this process, rigorous quality activities must be conducted to assess and guarantee the Good Clinical and Pharmacovigilance Practices (GxP) for study compliance. For about 25 years, most of these activities have been performed in the form of audits, which implies a high volume of manual work and resources in addition to being reactive by nature. Due to the limitations of this approach, together with intent to leverage new technologies in the data analytics eld, a more holistic, proactive and data-driven approach needed to take place. For this to happen, quality assurance expertise needed to be complemented by the data literacy skillset. To achieve this, the Data Analytics University (DAU) was created. An in-house training program composed by two pathways that provided a framework for clinical quality staff to develop their data analytics capabilities. The rst pathway covers the basics of statistics, probability and data-related terminology, while the second deepens further into the topics covered in the former followed by hands-on activities to put the knowledge to test. After successful completion of 15 DAU sessions, over 310 trained staff were able to apply their learning on data analytics and solve potential issues that might arise with a given dataset. In the near future, the DAU will be made available externally as an e-learning training program.


Background
Clinical drug development is characterised by high attrition rates, large capital expenditures and extensive timelines. Only 11.8 % of all drugs entering the clinical drug development phase will be approved, which takes 6 to 7 years and requires an investment of around 960 Million USD without accounting for the cost of failure [1]. It is a high stakes game not only for pharmaceutical companies (clinical trial sponsor) but also for patients. To identify the risk of Adverse Drug Reactions (ADRs) [2], drugs are extensively tested in animal models in the pre-clinical study phase. Drugs then enter the clinical study phase, where they are initially tested on a small sample of volunteers, and will only enter the next phase with more subjects if previous results indicate that the drug was safe. In Phase I, drugs are tested on a small sample of healthy volunteers to determine safety and dosage. In Phase II, drugs are tested on a small number of patients to have an initial reading on e cacy. Subsequent Phase III trials are larger pivotal trials on a su ciently large number of patients to adequately prove drug safety and e cacy. A successful Phase III study will usually lead to drug approval by the competent Health Authorities (HA), but sometimes additional postapproval Phase IV studies could be requested. To assess drug e cacy and safety, appropriate data need to be collected in each phase of the clinical trials, by often hundreds of clinical trial sites that can be located all over the globe. This requires well-coordinated data management and analysis plans.
Therefore clinical trial sponsors are required by the International Conference on Harmonization (ICH) guidelines to implement and maintain Quality Assurance (QA) and quality control systems to ensure the rights, safety and well-being of research subjects and the integrity of clinical research data [3]. Following the same principles, Good Clinical Practices (GCP) and Good Pharmacovigilance Practices (GVP) have been established as quality frameworks, the former for clinical trials, the latter for ensuring the safe use of medicinal products, once they are available to patients. Preventing harm from adverse reactions in humans arising from the use of authorized medicinal products and promoting their safe and effective use are the fundamental PharmacoVigilance (PV) objectives. Marketing Authorization Holders (MAH) are required to implement and maintain a quality system to ful l their PV activities [4].
Quality Assurance (QA) teams of pharmaceutical companies conduct activities to assess compliance to GCP and GVP regulated activities. These activities encompass the set-up and management of a Quality Management System (QMS), including training, Standard Operating Procedures (SOPs) and Quality Strategic Activities (QSAs) such as audits. Auditing activities follow well-de ned processes and have been implemented for over 25 years (at least since ICH-GCP in 1996) and involve a high volume of manual work. This is a reactive process in that audits are executed based on risk assessed from past data (from several months up to a year). For large and medium sized companies, auditing the entire "universe" of clinical trials sites on an annual basis is generally infeasible due to sheer volume, placing an even greater emphasis on a sound and timely risk assessment strategy to ensure QA activities are prioritized to assess the identi ed risks contemporaneously. A holistic and data-driven approach for QA that could help anticipate and reduce the risk of occurrence of key GCP/PV quality issues and that could also be used for quality by design was not available. Furthermore, it required a high level of effort and global travel by quality assurance professionals, impacting upon the work-life balance of the quality assurance professional and an ecological impact due to the high volume of transportation required to ful ll the audit schedule.
While the industry has recently been trying to leverage modern developments in data management and IT systems to facilitate the cross-analysis of clinical studies and PV processes [5,6,7,8,9,10,11], a unique skill set was needed to build and embed advanced analytics capabilities within its staff. Although clinical QA experts bring a unique skillset, "data literacy" is becoming a necessary core capability for the QA professional of the future. As a result, an in-house training program which would provide a framework for clinical quality staff to develop their analytics capabilities and increase their ability of using data-driven approaches and solutions needed to be developed: the concept of the Data Analytics University (DAU) was created.

Methods And Outline
We designed a program consisting of two consecutive pathways titled Freshman and Graduate. Targeted to quality professionals from all ages and backgrounds (see Fig. 1), the aim was to empower them to make their own self-service data requests and to perform a self-guided descriptive analysis using commonly available spreadsheet software such as Microsoft (MS) Excel. To accomplish this, the program would enable them to develop an understanding of basic data vocabulary in conjunction with a sense for statistical thinking to effectively use data products -such as dashboards -developed by quality data analysts and scientists. Among the audience on the rst pathway we wanted to identify a group of quality data champions that would assist as Subject Matter Experts (SMEs) in the development of such data products. The former pathway would establish a common ground understanding and the latter would help identify and educate data champions. (See Fig. 2)

Freshman pathway
The main challenge for designing the Freshman pathway was the heterogeneity of the target audience. Some of them having even studied statistics (or a similar eld) and others not having engaged with statistics since their high-school-level education. We needed to heavily iterate over the basic concepts to educate the latter while still keeping the former engaged and interested in order to onboard them to our Graduate training and identify them as possible future SMEs. We therefore initially opted for a multi-day, face-to-face, on-site training as opposed to a virtual training or e-learning. This would ensure we had enough time to cover the basics and by recruiting our data analysts/scientists along with the training designers as trainers so the more advanced participants could actively engage with them and discuss expert topics during breaks.
During the training design of the Freshman pathway we implemented several elements that made the training more interactive and entertaining for participants of all levels: e) A quarter of our training time was dedicated to a hands-on session in which participants would have to work in groups to design an analysis plan to address a relevant business problem and afterwards in the second part they would have to implement that analysis in MS Excel. The problem statement allowed the audience to design from a simple and basic to a very complex and creative solution. During these dynamic sessions we made sure that enough trainers (1 trainer per 7 participants) were present to ensure all questions could be answered.
After undertaking the Freshman pathway, participants could apply to enroll on the Graduate pathway with the prerequisite to demonstrate their learning by submitting examples of how they have used data-driven approaches and solutions to help them in their daily work and how they plan to use analytics in a future project as well as taking a supplementary class on how analytics can help on the decision making. For this pathway's design, we kept the same principles as in the Freshman, however these classes were focussed on action-oriented problem solving skills with MS Excel (more than 50%), and the in-depth understanding of descriptive and visual analytics. The pathway concluded with the completion of a nal work-related certi cation project.
In order to properly recognize the achievement of each participant individually and to ensure that the training met industry quality standards the program was certi ed by the CPD Certi cation Service [13], which issues online certi cates that can be included in digital CVs.

Training modules
The topics covered in both pathways were developed between the data analytics/science teams together with several business SMEs and inspired by other educational formats and materials [14,15] in conjunction with probable business problems that could be addressed using data analytics.

Freshman pathway
The contents of these modules were imparted in 2 full days divided in 4 modules of about 2 hours each.

Graduate pathway
This pathway started with a prerequisite module that had to be completed o ine, followed by 2 full days divided in 3 modules of about 2.5 hours each and with breaks in-between, ending with a nal Certi cation Project.
a. Class 1 -Advanced Strategies for Data Analysis: Addressed business problems using advanced MS Excel functions, namely -Pivot Tables, advanced formulas and Data Joins.

b. Class 2 -Data Visualisations and More: Basic Understanding of Random Variables and Data
Distributions followed by the principles of data visualisation and how to implement them in MS Excel.
c. Class 3 -GxP Problem Solving (with data!!): Students were asked to put their learning into test by individually solving an analytical problem statement. This was part of the requirements needed to progress onto their Certi cation Project. This project would take place within the last class so students could have a chance to raise questions.
d. Certi cation Project: On this nal task students had yet a nal opportunity to demonstrate their newly learned skills, this time on their own real work challenge. They were asked to propose an analytical challenge they were currently facing within their area of expertise, (e.g. GCP or GVP) and attempt to solve it by applying the knowledge acquired. This was a "take-home" project where the students would have to solve it on their own without the instructors' help. After completion, they shared and discussed their work with the other students.

Evaluation
We performed evaluations on both pathways with two goals in mind. The rst one was to give attendees a chance to check and demonstrate their knowledge and understanding; the second was to assess the impact DAU had on their learning as well as to identify any section of the program that could bene t from improvement (e.g. spend more time on a topic or to nd a more illustrative example).

Freshman pathway
The exam consisted of 14 multiple choice questions. To earn a Certi cate in Data Basics a pass mark of at least 80% was required.
As the aim of this pathway was to give a common ground of understanding, the answer to the questions were fairly straightforward. The questions focused on ensuring concepts, de nitions and principles were understood. A few questions were also designed for the student to perform some basic analyses that required simple calculations.
Right after the exam's completion, a learning transfer check was performed by running Focus Groups sessions where students formed small groups by their work areas to have an opportunity to identify and come up with real-world scenarios to put on practice what they learned. This activity encouraged them to also do a peer-to-peer knowledge check, when bringing up ideas in the discussion, as the small number of members per group allowed everyone to actively participate.
With the purpose of assessing an individual's capability to solve a data analysis problem on their own, where there is not always a clear problem statement that indicates the start to nish path, a typical exam with a set of questions would not be enough. The best way to prove the ability to solve a real data analysis problem would be to do precisely that.
To maintain both the motivation and learning impact, we provided an initial analytical problem statement that would be challenging while at the same time achievable.
After successful completion of it, the attendees would then embark on their Certi cation Project. This time the challenge consisted in the search and successful resolution of an existent data analysis problem in their work eld. The project would have to be ambitious enough so they could put in practice the lessons learned in a practical and useful deliverable in MS Excel.
Upon submission to the quality analytics team, the student would have to present their project, walk through the problem found, the reasoning and approach to the solution as well as the impact it had on the business.
Once reviewed and approved, successful students would be awarded their Certi cate as Data Analytics University graduates.

Impact
Freshman and Graduate pathways demonstrated a pronounced positive impact in the ways of working of QA professionals. Through the Data Analytics University, we were able to empower and build data literacy across QA professionals from different backgrounds. Through a review of basic statistics, data-related concepts and an in-depth session of the functionalities of MS Excel, they were able perform data-driven tasks which improved their ways of preparing and conducting quality activities.
We also saw a signi cant change in the kind of data-driven questions that were being put forth to the quality data analyst and quality data scientists within the team. It is important to note that QA experts bring an unmatched skill set to their role, hence it can be safe to say that they understand the technicalities and challenges of clinical study audits. When these QA experts looked at the clinical study through an analytical and data oriented perspective they were able to ask very clear questions that helped identify GCP/PV risks and mitigate them in a timely manner. The initial method of conducting audits was a reactive process and involved hand picking certain areas of investigation identi ed based on past experiences around a study or a therapeutic area. This method, though tried and tested, did not encompass the entire realm of clinical data for a study. The DAU helped the QA professionals in shifting the method of audit from a reactive approach to a proactive one. After undergoing the trainings offered as a part of the DAU, the QA experts were able to use a data-driven approach to review and identify potential sites and process areas for audits. Multiple QA experts regularly started conducting risk based site selection data analyses to determine which sites pose the highest chances of GCP noncompliance for their study. Aiding to target sites based on data evidence instead of collective historical assumptions.
Towards the end of the Graduate pathway training, we were able to identify several data champions within the QA team who had a keen eye for data and had shown signi cant development in their analytical skills. Data champions participated as Subject Matter Experts within their teams and led datadriven initiatives for studies.
Results from a survey conducted to the Freshman pathway participants showed that most students found the program useful and applicable to their current positions, which translates into further pro ciency at their daily work (see Fig. 5), e.g. audit preparation activities. In summary, the outcome of the DAU was that trained staff were able to think strategically about data and potential issues that might arise with a given dataset; to feel empowered with their new knowledge and less intimidated when working with data and to e ciently work independently with data to identify anomalies in it.

Challenges
Some of the challenges faced while conducting DAU trainings were mainly around different devices being used for example Windows vs. Mac and different versions of MS Excel. The DAU training consists of several case studies in each module. These case studies are instructor led, where the trainer goes through certain steps within Excel and the trainee must follow along. Because of different devices and different versions of software, we observed that these case studies usually took a little longer to get through as the trainees required additional assistance in following along.
A common challenge for learning programs is the forgetting curve, which was rst theorized in 1880 by Hermann Ebbinghaus [16]. As some of the modules of the DAU were spaced in time, and as some of the trainees did not always have an immediate opportunity to apply their learning, we needed to establish a mechanism to defy the forgetting curve. To help retain the materials learned during each pathway, we developed a series of three short email challenges that we called learning boosts -the rst one sent after two days, the second one sent after two weeks and the third one sent after two months once a learning pathway was completed. After the last learning boost set, feedback was sought to ascertain if trainees had retained the information related to the course.
The DAU trainings were also affected by COVID-19 when all the training sessions had to go completely virtual. To make sure the virtual trainings were successful, we made sure to keep the class size small (5-6 trainees per class only) so that all the trainees could be assisted and monitored effectively. Going virtual also meant handling all technical di culties remotely. Compared to face to face, virtual trainings were de nitely more challenging. Moving forward, we plan to mitigate these issues by checking attendees software version in-advanced and provide the platform-considered instructions for each, or move to an inbrowser software alternative instead.

Conclusion
In this paper, we proposed an educational program to address the existing need of upskilling quality professionals with data-related competences. We delivered a t-for-purpose program: the Data Analytics University. An in-house designed program, it aimed to cover all relevant aspects of data analytics, while addressing the needs of quality professionals to perform their work more effectively. In consequence, successful participants no longer needed to request basic data analyses to the analytics team and upcoming requests were more advanced and analytical. Quality issues were easier identi ed with data (as opposed to manual review) by the end user for investigation. Data champions are now 'points of contact' that actively participate in query resolution helping reduce the analytics team burden. In the near future, the DAU program will become available to other quality professionals and HA inspectors, by offering an e-learning version of the Freshman pathway free to members of an independent quality assurance industry association. Declarations