The reliability of a psychological test concerns the accuracy of the instrument no matter what it measures[30]. A reliable test always measures the psychological construct in the same way. Recent research has highlighted the use of two types of indices: internal consistency indices and temporal stability indices. When the test is measured on a continuous scale (like a Likert-style scale), the Cronbach alpha analysis is recommended[23]. The value of the alpha coefficient can vary between 0 and 1. The higher the scores, the more the instrument is judged to have a high level of internal consistency. On the other hand, a score too high (more than 0.95 for example) would indicate the presence of a redundancy in the elements which infers that some of them measure a an overly narrow aspect of the concerned dimension. Values between 0.70 and 0.85 are therefore generally preferred [23, 31]. We compared the results of internal consistency in this study with the study of Pacifico et al.[16], Boor et al.[2, 16] and Silkens et al.[13] (Table 3). As for the other studies, the scores obtained varied between 0.7 and 0.91, indicating a good internal consistency of the French version. Temporal stability indices were evaluated by asking subjects to fill the instrument twice. We asked 15 residents to complete the questionnaire twice, but only 13 responded. The correlation coefficient was 0.89, indicating good temporal stability. Temporal stability has not been evaluated in previous DRECT creation or validation research.
Confirmatory factor analysis (CFA) is the recommended method for validating the factorial structure of a questionnaire [23, 27, 28]. When developing a new instrument in the social sciences, the factor structure is determined by the exploratory factor analysis, then the created instrument is validated by the confirmatory factor analysis. In the case of validation of a translation of an instrument, the confirmatory factor analysis is used directly, taking as a model the proposed structure of the original instrument. Although the CFA is strongly recommended, it is noteworthy that it is not systematically used for the validation of an instrument. Thus, Pinnock et al. used only internal consistency to validate the adaptation of DRECT in the Australian context[14]. Similarly, Caron et al.[32] only used internal consistency (Cronbach) to validate the French translation of PHEEM (Postgraduate Hospital Educational Environment Measure)..
Although researchers agree that the larger the sample size, the better for the CFA, there is no universal agreement on sufficient size. A sample of over 200 is considered acceptable for most models[24, 28]27]. Other authors have proposed a minimal number of cases for each question (5 per question) [33, 34]. The sample in our study was 211, which meets the requirements of the aforementioned rules.
There are several indices of goodness of fit, and most of them can be interpreted as describing the lack of fit of the model to the data[28]. Each type of adjustment index provides different information about the fit of the model (or the non-fit), so that researchers generally indicate several indices of fit when evaluating the fit of the model. There are many guidelines for an “acceptable” model fit[24, 35]. For this study, we used the threshold values recommended by Brown. It is important to note that these are not rigid guidelines, and Brown comments that his use of "close to" for threshold values is intentional. We found that the SRMR and RMSEA values were adequate. The CFI and the TLI were close to the 0.90 threshold. Silkens et al.[13] reported better fit results with SRMR and RMSEA of 0.04 (good result), CFI and TLI at 0.92 and 0.91 respectively, which is considered acceptable. Boor et al.[2] obtained a CFI of 0.89 (near acceptable threshold) and a RMSEA of 0.04 (good) and considered this result as indicating a good fit of the model (Table 4). We found only one study that attempted to validate the DRECT instrument in a non-european context[2, 16]. The authors did not obtain a good fit of the model proposed by Silkens et al.[13]. They proposed an alternative model with 28 questions that gave better model fit results. Given our results, and compared to the results obtained by Boor and Silkens, we can consider that we obtained an adequate model fit. This study therefore validated the French version of the DRECT instrument, thus allowing its use in French-speaking countries. One of the challenges of medical research is the reproducibility of the scientific results obtained. In this case, reproducibility of the results obtained by the original authors suggests the robustness of the DRECT instrument and its adaptability to other international residency programs.
One of the limitations of this work was the small size of our sample compared to previous studies. In the development of the first version of the DRECT instrument by Boor et al., 1276 residents participated in the validation study[2]. In the development of the modified version by Silkens et al., 2306 residents participated in the work[2, 13]. Finally, 843 residents participated in the validation work of DRECT in the Philippines[16]. These authors were able to reach this important number because investigators had access to all residents contact information in their countries. In Morocco, faculties of medicine do not have an email database for residents in training. This could explain the low number of participation compared to other studies. Nevertheless, it should be noted that this is relative since the size of our sample is sufficient to carry out a confirmatory factor analysis.