Model
The French CML Health (CMLH) model3,15 is a 9-levels reading grid that iteratively breaks down the innovation process into three interconnected axes: needs, technology, and programmatic. It is a direct descendant of the original CML model developed by the American National Aeronautics and Space Administration16, which incorporated the last two domains but now includes an additional user-centered axis. Furthermore, it specifies the technology and programmatic domains to adapt them to French and European regulatory requirements in research methodology and data management (Table 1)
Domain
|
Sub-domain
|
Needs
|
Usage
|
Market
|
Clinical proofs
|
Technology
|
Technical development
|
Data management
|
Intellectual property
|
Schedule
|
Project management
|
Regulation
|
Funding
|
Table 1 : Factorial structure of the French CML Health model, from (Trognon, 2023)
The first axis of the CML Health model (Table 2) is derived from the Technology-Readiness Level model17. It evaluates the development of technological concepts, product management, and ownership through three formalized axes: technological development, data management, and intellectual property. The technological development axis gradually assesses processes on a scale of 1 to 9, starting from evaluating the current state of the art through critical functionality simulations, up to managing the product life cycle. The second axis focuses on how project leaders will handle data from their own devices, encompassing R&D data, protection protocols, and automation of product life cycle data. Lastly, the intellectual property axis provides insights into competitive analysis, monitoring existing patents, and managing potential infringements.
|
CML1
|
CML2
|
CML3
|
CML4
|
CML5
|
CML6
|
CML7
|
CML8
|
CML9
|
Technological development
|
Evaluation of state-of-the-art
|
Conceptualization and theoretical analysis
|
Functional simulation and testing
|
Creation of software demonstrator
|
Development of alpha prototype
|
Technological analysis for improvement
|
Automation of function testing
|
Addressing software bugs and issues
|
Management of product lifecycle
|
Data management
|
/N/A
|
Collection of R&D data
|
Organization of software data
|
Cybersecurity measures
|
Availability of data
|
Utilization of clinical data Intellectual property of clinical data
|
Access to data servers
|
Implementation of data collection devices
|
Generation of material-epidemiology data
|
Intellectual property
|
Monitoring of patents
|
Patents pending approva
|
Specific granted patents
|
Ensuring freedom of operation
|
Specific granted patents
|
Ensuring freedom of operation
|
N/A
|
N/A
|
Competitive intelligence
|
Table 2 : Example of the different maturity levels for each sub-area of the "Technological Maturity" domain, adapted from (Trognon, 2023)
The second axis of the CML Health model (Table 3) represents the first adaptation of the CML model developed by the American National Aeronautics and Space Administration. It incorporates project management and regulatory aspects, but in this specific case, tailored to European and French requirements for devices (such as access to the market under CE mark certification) and research. The aim is to adhere to Good Clinical Practices (GCP) and ethical principles safeguarding individuals participating in research, protecting them from potential risks associated with acquiring new biological or medical knowledge. It should be noted that research is conducted on healthy or sick volunteers with the intention of advancing knowledge in the biological or medical fields. The French regulatory framework is based on European regulations, with recent updates for innovation seeking medical device status (ANSM's proposed "clinical investigations" categories). Furthermore, it ensures that methods for collecting and processing health data comply with the General Data Protection Regulation (GDPR) and French reference research methodologies (MR-00X), which range from level 1 to 3.
|
CML1
|
CML2
|
CML3
|
CML4
|
CML5
|
CML6
|
CML7
|
CML8
|
CML9
|
Project mangement
|
Identifying the driving factors
|
Conducting initial project risk analysis
|
Setting up test beds
|
Identifying complementary skills
|
Creating a detailed development plan
|
Updating project elements and addressing risks
|
Identifying marketing and sales skills
|
Finalizing and closing the project
|
Reviewing industrial development partnerships
|
Regulation
|
Establishing regulatory framework
|
Ensuring compliance with RGPD
|
Analyzing product risks
|
Assessing ethical aspects of the product
|
Collecting regulatory data
|
Consolidating the technical file
|
Compiling the CE mark file
|
Defining regulatory framework for data use
|
Renewing the CE marking
|
Funding
|
Identifying potential funding sources
|
Preparing the business plan
|
Planning financing for project demonstrators
|
Formalizing the business plan
|
Developing financial models
|
Establishing a Minimum Viable Business Model
|
Initiating Series A Capital Raising
|
Updating economic assumptions with real-life data
|
/
|
Table 3 : Example of the different maturity levels for each sub-area of the "Technological Maturity" domain, adapted from (Trognon, 2023)
The Programmatic axis of the CML Santé France model assesses the programmatic maturity of the project across three areas: project management, regulatory aspects, and financial aspects (Table 4). The project management axis evaluates the project consortium, from pilot identification to development partnership renewal, including the creation of Test Beds, and examines the nature of partnerships formed. The regulatory axis assesses programmatic maturity from the initial legal investigation surrounding the project to CE mark renewal, encompassing product risk analysis, compliance with European (e.g., MDR, GDPR) and French (i.e., ethics, clinical investigations for medical devices) regulatory constraints. Lastly, the financing axis allows for gradual evaluation of financial aspects, ranging from identifying potential financing sources to updating business economic assumptions based on real-life usage data of the device.
|
CML1
|
CML2
|
CML3
|
CML4
|
CML5
|
CML6
|
CML7
|
CML8
|
CML9
|
Usage
|
Understanding social and public health context
|
Identifying practice situations that justify the need
|
Collaboratively developing tailored usage scenarios
|
Conducting UX/UI lab evaluations
|
Defining the usage industrialization scheme
|
Assessing usability and acceptability
|
Evaluating ecological impact of a pre-series
|
Examining real-life organizational impact
|
Ensuring quality control of patient reported experience
|
Market
|
Reviewing the existing market literature
|
Identifying the value proposition
|
Defining product positioning and expected impact
|
Quantifying the expected impact
|
Developing market access strategy
|
Characterizing the device based on usage surveys
|
Implementing marketing elements (deployment, export)
|
Refining go-to-market strategies by customer type
|
Marketing across different markets
|
Reviewing relevant clinical literature
|
Review of the clinical literature
|
Identifying the medical need
|
Formulating the clinical strategy
|
Initiating preliminary clinical trials
|
Analyzing results from clinical trials
|
Drafting study reports (publications)
|
Conducting multi-center clinical trials
|
Performing medico-economic studies
|
Ensuring quality control of patient reported outcomes
|
Table 4 : Example of the different maturity levels for each sub-area of the "Need Maturity" domain (PREM : Patient Reported Experience Measure ; PROM : Patient Reported Outcomes Measure), adapted from (Trognon, 2023)
Finally, the last axis of the CML Santé France model represents the true innovation of the consortium by specifying the CML model as described by NASA. It incorporates elements of consumer behavior theory18, particularly addressing the barriers to innovation, and enables the evaluation of maturity in terms of needs across three axes: uses, market, and clinical evidence (Table 5). The Uses axis provides insights into the device's value and ensures user-centric product development in terms of uses. It assesses development from identifying the social context and public health implications to evaluating the perceived quality of care through patient evaluation methods (PREM). This axis verifies the elimination of certain functional barriers, such as conflicts with established usage patterns. The Market axis examines the competitive landscape concerning device uses, including market literature reviews and evaluations of market segment diversity and respective access strategies. This axis confirms the elimination of functional value barriers and verifies the uniqueness of the device's value proposition. Finally, the Clinical Proof axis assesses the quality of clinical investigations conducted on the device, ranging from comprehensive literature analysis to formalized processes for evaluating the perceived quality of device results by patients (PROM). This axis addresses the functional barrier of uncertainty that arises when end-users have limited access to devices under development, as described by Stone and Grønhaug19.
Subjects
Participant data were collected during sessions organized between 2021 and 2023, including auditions of innovative project leaders in the field of medical technologies (MedTechs). In chronological order, the data comes from two individual semi-directive interviews carried out as part of the preparation of the methodological deployment; five consortium auditions organized during the Future4Care 2022 start-up competition; three semi-directive group interviews carried out during the French National Digital Health Sector's Call for Expressions of Interest; and a day of presentations (symposium) as part of the e-Meuse Expert Committee meeting, whose aim is to identify the optimal conditions for the deployment of digital innovation in healthcare in the Eastern France.
All participants received detailed information on the objectives and purpose of the study and ethical consents were obtained online in agreement with the Declaration of Helsinki and the study protocol was approved by the Institutional Review Board Commission Nationale de l'Informatique et des Libertés (registration n°2230503).
Study design
The verbatim of pitches, individual and group interviews were recorded, then transcribed according to the formalism of the 2TK model20,21, i.e. transcribed into speech acts sequenced by the pauses identified in the discourse in order to reproduce its kinetics as closely as possible; as well as to identify the most circumscribed packets of information possible. Each speech act was then labeled according to 3 parameters: the factor (three levels: need, programmatic, technology; otherwise "Null"); the sub-factor (three levels per factor: market, uses, clinical evaluation for the "Need" factor; technological development, intellectual property, data management for the "Technology" factor; and financing, regulatory aspects, project management for the "Programmatic" factor); and the CML level (nine levels: CML1 to CML9). n=10952 speech acts were labeled; then only non-Null speech acts (n=2070) were retained.
These speech acts were used to build a two-layer computational pipeline with two complementary and distinct objectives: a first "expert systems" layer of CatBoost22–24 algorithms, a per-factor algorithm for calculating the probability of the speech act carrying information about the given CML Health factor or belonging to another factor or noise; and a second layer consisting of an artificial neural network embedding the probability of each expert system and providing a final decision regardless of the textual material provided, and which will be used to predict the membership factor of the questionnaire items from which the speech acts have been labeled.
Scale development & item generation
The items were developed directly from the Concept Maturity Levels Santé grid of the Forum Living Lab Santé Autonomie (version 2021/12; Supplementary File 1). We generated 1 item for each criterion, with the aim of eliciting a personal representation of the project's status, from which the participant could make a judgment between the induced representation and the subjective perception of progress. A total of 178 items were generated from this grid (French : Additional File 2 ; English : Additional File 3).
We chose to base the questionnaire on an Osgood-type scale, constructing the lower and upper bounds in a logic of continuity, where items belonging to higher levels have lower bounds that can be superimposed to some extent on the upper bounds of lower items. Similarly, we chose to scale the questionnaire on a linear 6-point scale in order to avoid neutrality bias, although its existence is still debated25,26. In addition, the stimuli were built around first-person personal pronouns, which allows for greater immersion of the subject in relation to the situational context about which he or she is being questioned27, which seems particularly appropriate in the case of a maturity assessment of project concepts.
However, not all sentences describing criterion within the original grid were unambiguous in terms of their specific factors, necessitating few tuning editions to ensure their alignment with the intended factor. For instance, the phrase “The technical tools that can meet the needs are clearly identified” has the factor “Technology” in the questionnaire, even though it contains the word “need”. In this case the criterion was modified to “The appropriate tools to meet the technical requirements are clearly identified.” in the scale. Once the item adaptation was completed, the corpus underwent annotation and classification was performed using the computational pipeline described in the relevant section.
Computational content validation
Computational content validation is an innovative procedure that uses computational techniques based on machine learning to assess the content validity of a measurement instrument, scale or questionnaire. It also aims to verify whether the instrument's items or questions adequately reflect the concept they are designed to measure.
Within the framework of computational content validation, an algorithmic model is trained on a portion of verbatim data obtained from discussions focused on project maturity, and its performance is then evaluated on a different portion of the data (called a test set). The aim is firstly to demonstrate that the model can reliably and accurately predict the dimensions or factors underlying the discourse and superimposable on the CML Health questionnaire criteria from the words spoken by the participant on a given sample; then to demonstrate that this model is capable of correctly identifying these dimensions in a new data set (i.e. the questionnaire), providing an additional segment of evidence for the questionnaire's content validity. This approach thus offers a new computational alternative to traditional content validation methods, which are usually based on a pre-test of the questionnaire, whereas in the case of computational content validity, this step can be carried out under pre-test conditions.
Factorial structure and content validation
The method employed in this study aims to circumvent the limitations inherent in classical psychometric validation by using a computational approach. Here's how it could help overcome these constraints while providing relevant information on factor structure, internal consistency and reliability.
Factorial structure : Using a machine learning model, we are able to identify which factors (in our case, the categories of 'Technology', 'Programming' and 'Need') are relevant to each statement. This is an indirect way of exploring the factor structure of the dataset. Indeed, if the model is able to predict these factors correctly, it suggests that they are well-defined and have real meaning in the context of technological innovation in healthcare.
Internal consistency and reliability : Internal consistency and reliability are assessed through the performance of the machine-learning model. In this context, internal consistency is reflected by the model's ability to make consistent predictions on different parts of the training dataset. Reliability, on the other hand, is assessed by examining the model's performance on a novel test dataset. If the model makes accurate and consistent predictions on this data set, this suggests that it is reliable.
This approach has the advantage of not requiring a large number of participants, which is a major constraint of classical psychometric validation. What's more, the machine learning model can be re-trained as new data are collected, enabling continuous improvement in accuracy and reliability.
It's important to note that this computational approach is not a substitute for conventional psychometric validation. On the contrary, it offers a practical alternative when the data collection required for conventional validation is not feasible. In addition, it allows conclusions to be drawn from a fundamentally smaller data set, but one that is potentially richer in information, offering the possibility of continuous updating as new data are collected. It can also be superimposed on conventional validation techniques, providing a new body of evidence for the validity of the measurements produced by these tools.
Computational content validation procedure
The data used in this study consisted of transcribed text derived from meetings and presentations inscribed by 2TK method20,21. For performing the analysis, three experts assigned labels to the transcribed phrases based on the relevant factors from the CML Health model (Need, Programmatic and Technology). For representing the transcribed phrases in a format suitable for analysis, the Bag of Words technique was applied. This approach transformed the text into numerical vectors, enabling subsequent computational processing and modeling.
The proposed model in this work was to have two layers of classification (as shown in Figure 1): A layer which provides a probability that transcribed phrases belong to a certain factor in the CML metric ; while the aim of the second layer is to provide the predicted label of the transcribed phase based on the given probabilities and the corresponding transcribed phrase (figure showing the plan in our study).
Figure 1: Computational analysis pipeline
In order to determine the most suitable classifier for our study, a comparison was performed among various classifiers, including Decision Tree, Bagging classifier (comprising 500 decision trees), Extra Tree, Gradient Boosting, AdaBoost, Catboost, and Random Forest (Figure 2). The comparison was conducted using a 2/3 train and 1/3 test split of the data. This evaluation allowed us to identify the classifier that exhibited the highest performance and accuracy. Following the evaluation, the Catboost classifier emerged as one of the top-performing classifiers, demonstrating a high level of accuracy in classifying the transcribed phrases. The decision to select Catboost was further supported by its flexibility in handling categorical data, which was a relevant characteristic for our study.
Figure 2: Model selection for building the computational pipeline.
To optimize the performance of the Catboost classifier, the hyperparameters were fine-tuned using Bayesian Optimization. This optimization approach ensured that the hyperparameters of the classifier were optimized to obtain improved classification results. These hyperparameters (and their corresponding ranges) were: The number of estimators ([100, 1500]), the learning rate ([0.01, 0.3]), the depth of each decision tree ([4, 10]), the L2 regularization coefficient ([1, 10]), the randomness strength ([0, 1]), the temperature parameter for the bagging process ([0, 10]) and the border count parameter for categorical features ([32, 255]). Furthermore, the number of initial points for the Bayesian Optimization was set to five and the number of iterations was set to 20. Subsequently, the Catboost classifier was utilized to generate probabilities for each category (Pr(Need), Pr(Programmatic) and Pr(Technology)) associated with the transcribed text. These probabilities served as an indication of the likelihood of each category's presence in the transcribed phrases.
Furthermore, in order to enhance the classification process, a Neural Network was trained using the probabilities generated by the Catboost classifier, in addition to the transcribed text. Another set of data was reserved for testing the performance of this Neural Network layer.