The final Minho-SAS, composed of 15 dichotomous yes/no items and a global performance score ranging from 1 to 5 is exposed in Table 1, with additional information regarding the number of positive assessments in the OSCE.
Table 1
Minho Suture Assessment Scale translated into the English language.
Question
|
Item
|
Yes/No
|
Nº of positive assessments
|
1
|
Confirms and prepares the necessary material
|
Y/N
|
242
|
2
|
Hold the needle correctly with the needle holder
|
Y/N
|
251
|
3
|
Handles the needle holder correctly
|
Y/N
|
259
|
4
|
Handles the tissue forceps correctly
|
Y/N
|
245
|
5
|
Suture at the correct distance from the edge AND at the same distance on both sides
|
Y/N
|
221
|
6
|
Demonstrates surgical dexterity in the entry and exteriorization and handling of the needle at the exit of the skin
|
Y/N
|
181
|
7
|
Performs knot, counter-knot and knot correctly.
|
Y/N
|
249
|
8
|
Correctly cuts the thread with proper length and technique.
|
Y/N
|
206
|
9
|
Places the knot lateral to the wound
|
Y/N
|
199
|
10
|
Places the appropriate degree of tension on the knot.
|
Y/N
|
209
|
11
|
Correct distance between sutures
|
Y/N
|
209
|
12
|
Performs suturing safely
|
Y/N
|
236
|
13
|
Safely store suture equipment and needle at the end of the procedure.
|
Y/N
|
166
|
14
|
Performs the suture with dexterity, economy and fluidity of movements.
|
Y/N
|
164
|
15
|
Respects the tissues
|
Y/N
|
228
|
Global
|
Global Assessment Score
|
1 to 5
|
-
|
For the initial validation, the face-validation process was performed by showing the final prototype to experienced surgeons, with expertise on the skill to be assessed, with a unanimous positive judgement.
In order to use Item Response Theory (IRT) and particularly Rasch analysis, the unidimensionality must be confirmed. Using jamovi, we assessed the residuals analysis and the principal component analysis (PCA), displayed on Table 2 and 3, respectively. Values of correlation of coefficient residuals > 0,3 indicate the assumption of independence is compromised, which does not occur in our data, as can be seen in Table 2. Our results from the PCA revealed a single dominant component, as evidenced by the substantial loadings of variables onto this component. This finding aligns with the assumption of unidimensionality indicating that a single factor can explain most of the variance in our dataset.
Table 2
Dichotomous Rasch Model - Residuals Analysis using Q3 Correlation Matrix
Q3 Correlation Matrix
|
|
x1
|
x2
|
x3
|
x4
|
x5
|
x6
|
x7
|
x8
|
x9
|
x10
|
x11
|
x12
|
x13
|
x14
|
x15
|
x1
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x2
|
|
-0.070
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x3
|
|
-0.094
|
|
-0.047
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x4
|
|
0.015
|
|
0.183
|
|
0.101
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x5
|
|
-0.047
|
|
-0.006
|
|
-0.107
|
|
-0.063
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x6
|
|
-0.062
|
|
0.017
|
|
0.059
|
|
0.019
|
|
-0.244
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x7
|
|
-0.015
|
|
-0.178
|
|
0.007
|
|
-0.073
|
|
-0.141
|
|
0.036
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x8
|
|
-0.206
|
|
0.002
|
|
-0.004
|
|
-0.169
|
|
-0.002
|
|
-0.071
|
|
-0.085
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x9
|
|
-0.233
|
|
-0.121
|
|
-0.001
|
|
-0.111
|
|
-0.182
|
|
-0.114
|
|
0.222
|
|
-0.106
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x10
|
|
-0.175
|
|
-0.005
|
|
-0.050
|
|
-0.042
|
|
-0.002
|
|
-0.186
|
|
-0.011
|
|
-0.088
|
|
0.079
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
x11
|
|
-0.055
|
|
-0.086
|
|
-0.040
|
|
-0.028
|
|
0.165
|
|
-0.165
|
|
0.002
|
|
-0.097
|
|
-0.016
|
|
-0.081
|
|
—
|
|
|
|
|
|
|
|
|
|
x12
|
|
0.072
|
|
-0.142
|
|
-0.049
|
|
0.025
|
|
-0.104
|
|
0.024
|
|
-0.105
|
|
-0.147
|
|
-0.123
|
|
-0.043
|
|
-0.169
|
|
—
|
|
|
|
|
|
|
|
x13
|
|
-0.008
|
|
0.051
|
|
0.055
|
|
-0.130
|
|
-0.062
|
|
-0.211
|
|
-0.012
|
|
-0.167
|
|
0.068
|
|
-0.050
|
|
-0.157
|
|
0.036
|
|
—
|
|
|
|
|
|
x14
|
|
-0.108
|
|
-0.048
|
|
0.016
|
|
-0.134
|
|
-0.074
|
|
0.326
|
|
-0.034
|
|
-0.079
|
|
-0.166
|
|
-0.130
|
|
-0.110
|
|
0.028
|
|
-0.035
|
|
—
|
|
|
|
x15
|
|
-0.022
|
|
-0.071
|
|
-0.056
|
|
-0.079
|
|
-0.088
|
|
0.069
|
|
0.002
|
|
0.042
|
|
-0.012
|
|
0.016
|
|
-0.002
|
|
-0.034
|
|
-0.312
|
|
-0.013
|
|
—
|
|
Table 3
Principal Component Analysis
|
Component
|
|
|
1
|
Uniqueness
|
x1
|
|
|
|
0.961
|
|
x2
|
|
0.440
|
|
0.807
|
|
x3
|
|
0.490
|
|
0.760
|
|
x4
|
|
0.424
|
|
0.820
|
|
x5
|
|
0.425
|
|
0.819
|
|
x6
|
|
0.628
|
|
0.606
|
|
x7
|
|
0.563
|
|
0.683
|
|
x8
|
|
0.362
|
|
0.869
|
|
x9
|
|
0.546
|
|
0.702
|
|
x10
|
|
0.546
|
|
0.702
|
|
x11
|
|
0.503
|
|
0.747
|
|
x12
|
|
0.456
|
|
0.792
|
|
x13
|
|
0.469
|
|
0.780
|
|
x14
|
|
0.640
|
|
0.590
|
|
x15
|
|
0.575
|
|
0.669
|
|
Regarding model fit analysis using Item Response Theory (IRT), Tables 4 and 5 resume the findings. It is interesting to observe the values of AIC (Alkaike Information Criterion) and SABIC (Sample-Size Adjusted Bayesian Information Criterion), statistical measures used to compare different models. When comparing, models with lower values indicate better fit. According to Baker et al., item discrimination values of 0.01–0.34 are considered very low; 0.34–0.64 low; 0.65–1.34 moderate; 1.35–1.69 high; and 1.70 and above very high 27.
Table 4
Model Fit Indices for Rasch and 2PL Analyses of the Minho-SAS. AIC: Akaike Information Criterion; SABIC: Sample-Size Adjusted Bayesian Information Criterion; HQ: Hannan-Quinn Information Criterion; BIC: Bayesian Information Criterion; Df: Degrees of freedom
|
AIC
|
SABIC
|
HQ
|
BIC
|
logLik
|
X2
|
df
|
p
|
Rasch
|
3282.212
|
3288.997
|
3305.310
|
3339.727
|
-1625.106
|
NA
|
NA
|
NA
|
2PL
|
3261.493
|
3274.215
|
3304.802
|
3369.334
|
-1600.746
|
48.7191
|
14
|
< .0001
|
Analyzing Table 4, which translates the assessment of the relative fit of the two models, the 2-PL analysis outperforms the Rasch model, with lower AIC, SABIC, HQ, and BIC values, which means the 2PL model is a significantly better fit, with better trade-of and likelihood of better predictions, adding with the chi-square value with a significant p-value. The findings of both models support the validity, based on their internal structure.
Table 5
Difficulty and Discrimination Parameters for Each Item in Rasch and 2-PL Models
|
Rasch model
|
2-PL model
|
|
Difficulty
|
Difficulty
|
Discrimination
|
1
|
2.8
|
2.301
|
0.525
|
2
|
3.32
|
3.37
|
1.38
|
3
|
4.02
|
5.08
|
2.16
|
4
|
2.95
|
2.8
|
1.14
|
5
|
1.98
|
1.768
|
0.919
|
6
|
0.948
|
1.34
|
2.52
|
7
|
3.18
|
3.92
|
2.06
|
8
|
1.55
|
1.327
|
0.774
|
9
|
1.37
|
1.34
|
1.25
|
10
|
1.63
|
1.58
|
1.23
|
11
|
1.63
|
1.52
|
1.08
|
12
|
2.52
|
2.51
|
1.31
|
13
|
0.628
|
0.573
|
1.013
|
14
|
2.21
|
0.925
|
2.935
|
15
|
1.75
|
2.47
|
1.7
|
Table 5 reveals interesting insights. In the Rasch model, item difficulty parameters range from 0.628 (item 13) to 4.02 (item 3), reflecting a wide distribution, suggesting that the Minho-SAS possesses items with progressive difficulty levels, which can be useful if we intend to assess different levels of proficiency. In the 2-PL analysis, the item difficulty values range from 0.573 (item 13) to 5.08 (item 3), corroborating Rasch's analysis. Interestingly, there is a shift in the difficulty of item 14 from Rasch to 2-PL analysis, which could indicate that the item is very challenging. As for discrimination parameters, ranging from 0.525 (item 1) to 2.935 (item 14), items with higher values are more effective in distinguishing suture skills and contribute more to the Minho-SAS accuracy. In this case, we have a broad spectrum of low to very discrimination values, according to current literature 30.
The Item Characteristic Curves (ICCs) for each item as estimated by the Rasch Model and the 2-PL are depicted in Figs. 1 and 3, respectively. Each curve illustrates the probability of a correct response to the item at different levels of the latent trait. The x-axis represents the latent trait continuum, indicating increasing levels of proficiency, while the y-axis shows the probability of endorsing the item. In these curves, it can be observed that, given a student whose value of Θ (theta) = 0, which would be an average student, the probability of having the item correctly would be 0.6 in item 14 (v14). When the probability is -2 Θ the value is almost 0.2 and when Θ is + 4 the probability would be almost 1. The Item Information Curves as estimated by the Rasch Model and the 2-PL are depicted in Figs. 2 and 4, respectively.
In ICC curves, item 3 exhibits a tall and narrow curve, indicating that it is very precise in measuring abilities, while item 6 has a broader ICC, suggesting that it provides information across a more comprehensive range of skills. Items with higher peaks are well suited for discriminating between individuals with similar abilities. As for items with steeper curves, as item 3, exhibit higher discrimination values, being effective in separating students for the suture skill level. Items with shallower curves, like item 5, have lesser discrimination values. The peak of the ICC represents the item's difficulty, item 2 peak is at a lower ability level indicating that it is an easy item while item 4 is at a higher ability level indicating a more challenging item. ICC curves the probability of students with varying skills having a positive item assessment: item 6 ICC curve shows a higher probability of success for students with higher skill levels, reflecting a more challenging item. In contrast, the item 15 curve suggests a less challenging item.
As for the IIC, it can be observed that the amount of information obtained from the Minho-SAS has its peak around Θ values of -2 at the Rasch Analysis and Θ = 0 at the 2-PL analysis, decreasing information when going out of this range of ability. This means that in the Rasch analysis, the Minho-SAS acquires the most informative measure for students with lower suture skill proficiency. The 2-PL analysis captures more detail in students with more average ability.
In the context of Rasch Analysis, item fit statistics, as displayed in Table 6, indicate how accurately the data fit the model. Outfit could be understood as a measure of how well an item matches a pattern of responses expected by the Rasch model. If the Outfit value is too high or too low, it could indicate that the item might be too difficult or too easy when compared with the prediction. Infit focuses on the consistency of responses, being more sensitive to the pattern of responses. If the value of infit is too high or too low, it should raise awareness that the item could be confusing or not aligned with the others. Fit values should be between 0,5 to 1,5 31. Items with infit or outfit > 2.0 should be excluded 32,33. In our work, we observed that the outfit statistics range from 0.479 to 1.253, which indicates acceptable variability in the behavior of the items. Similarly, the infit statistics range from 0.954 to 1.247, suggesting good levels of response consistency. The mean values for outfit (0.782) and infit (0.971) indicate an overall alignment of items with the Rasch model, though with some variability in their fit.
Table 6
– Item Fit Statistics for Rasch Model
|
outfit
|
infit
|
Minimum
|
0,479
|
0,954
|
Maximum
|
1,253
|
1,247
|
Mean
|
0.782
|
0.971
|
Standard Deviation
|
0,195
|
0,118
|
The scale demonstrated good internal consistency, as evidenced by McDonald's ω estimate of 0.776 (95% CI [0.736, 0.815]) and Cronbach's α of 0.765 (95% CI [0.723, 0.803]), seen in Table 7. These values suggest a robust level of reliability, indicating that the items comprising the scale reliably measure the intended construct. The reliability coefficient obtained for the Rasch model (0.71) and 2-PL model (0.74) reflects an acceptable range for assessment 30.
Table 7
Reliability Statistics for Minho-SAS
Estimate
|
McDonald's ω
|
Cronbach's α
|
Point estimate
|
|
0.776
|
|
0.765
|
|
95% CI lower bound
|
|
0.736
|
|
0.723
|
|
95% CI upper bound
|
|
0.815
|
|
0.803
|
|
The assessment of reliability coefficients yielded values of 0.71 for the Rasch Model and 0.74 for the 2-PL Model. These coefficients provide insights into the internal consistency of the respective models in measuring the latent trait.
Figures 5 and 6 reveal the patterns concerning the reliability of the Minho-SAS across varying ability models, in the Rasch model (Fig. 5) and the 2-PL model (Fig. 6).
The conditional reliability plots obtained from both Rasch and 2PL analyses (Figs. 5 and 6, respectively) give insights into the measurement precision of Minho-SAS across varying ability levels. In the Rasch model, the reliability is moderate level (around 0.4) for individuals with extremely low abilities (around − 6 Θ). As abilities progress towards − 3 to -0.5 Θ, the reliability significantly increases, stabilizing at a high plateau of around 0.8. This indicates that the Rasch model is robust in measuring abilities consistently within this range. However, as abilities increase beyond this range, there is a gradual but expected decrease in reliability, reflecting a natural attenuation in precision for individuals at higher proficiency levels. On the other hand, the 2-PL model demonstrates a notably improved trend but starts with lower reliability (less than 0.2) for those with the lowest abilities (-6 Θ). As abilities increase, the reliability swiftly improves, exceeding 0.8 around − 2 Θ, and maintains this high precision up to Θ = 0. This shows that the 2-PL model consistently and accurately measures abilities at these proficiency levels. Nonetheless, as abilities extend beyond Θ = 0 and reach higher levels (4 Θ), the reliability slowly decreases, indicating some challenges in accurately estimating abilities at these extreme levels. These distinct reliability trends across ability levels highlight the validation mechanisms' effectiveness in ensuring precise measurement and offer valuable guidance for future refinements in assessment methodologies.
As for the correlation between the latent traits (theta), which represents the student ability, and the sum of the items from the Minho-SAS, there was an extremely high correlation level of 0,96 from 2PL and 0,982 from the Rasch model, as can be seen on Table 8.
Table 8
|
|
Rasch
|
2PL
|
Sum of MinhoSAS
|
OSATS
|
MinhoSAS Global Score
|
Rasch
|
|
R Pearson
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
gl
|
|
—
|
|
|
|
|
|
|
|
|
|
|
|
p-value
|
|
—
|
|
|
|
|
|
|
|
|
|
2PL
|
|
R Pearson
|
|
0.975
|
|
—
|
|
|
|
|
|
|
|
|
|
gl
|
|
267
|
|
—
|
|
|
|
|
|
|
|
|
|
p-value
|
|
< .001
|
|
—
|
|
|
|
|
|
|
|
Sum of MinhoSAS
|
|
R Pearson
|
|
0.982
|
|
0.960
|
|
—
|
|
|
|
|
|
|
|
gl
|
|
267
|
|
267
|
|
—
|
|
|
|
|
|
|
|
p-value
|
|
< .001
|
|
< .001
|
|
—
|
|
|
|
|
|
OSATS
|
|
R Pearson
|
|
0.670
|
|
0.698
|
|
0.656
|
|
—
|
|
|
|
|
|
gl
|
|
135
|
|
135
|
|
135
|
|
—
|
|
|
|
|
|
p-value
|
|
< .001
|
|
< .001
|
|
< .001
|
|
—
|
|
|
|
MinhoSAS Global Score
|
|
R Pearson
|
|
0.800
|
|
0.811
|
|
0.793
|
|
0.931
|
|
—
|
|
|
|
gl
|
|
267
|
|
267
|
|
267
|
|
135
|
|
—
|
|
|
|
p-value
|
|
< .001
|
|
< .001
|
|
< .001
|
|
< .001
|
|
—
|
|
Regarding the correlation matrix, it provides a strong positive correlation (r = 0.975, p < 0.001) between the Rasch and 2PL models, indicating a high degree of agreement between the two models in measuring the latent trait (suture skill). The also high correlation (r = 0.982, p < 0.001 and r = 0.960, p < 0.001) between the two models and the Sum of Minho-SAS scores, indicates that students with higher Rasch and 2PL ability values tend to receive higher total scores, which corroborates the construct validity of the scale. While comparing with an existing validated scale, the OSATS, both Rasch (r = 0,670, p < 0.001) and 2-PL (r = 0.698, p < 0.001) have good positive correlations, which also adds to the validation of the scale. However, the values are not so high, and since Minho-SAS has more items, it could mean that it could be assessing more aspects of the skill. As for the global score, the strong associations also validate the use of the scale.
When searching for differences between the two assessed years, we conducted an independent samples t-test and a search for effect size using Cohen’s d coefficient, displayed in Table 9.
Table 9
Effect Sizes (Cohen's d) for Minho-SAS
|
|
Statistics
|
gl
|
p
|
|
Cohen’s d
|
MinhoSAS Global Score
|
|
Studen’s t
|
|
3.64
|
ᵃ
|
267
|
|
< .001
|
|
d de Cohen
|
|
0.443
|
|
Sum of MinhoSAS
|
|
Studen’s t
|
|
2.80
|
|
267
|
|
0.005
|
|
d de Cohen
|
|
0.342
|
|
2PL
|
|
Studen’s t
|
|
2.93
|
|
267
|
|
0.004
|
|
d de Cohen
|
|
0.357
|
|
Rasch
|
|
Studen’s t
|
|
2.98
|
|
267
|
|
0.003
|
|
d de Cohen
|
|
0.363
|
|
The comparison between the assessed years, seen in Table 9, revealing no significant difference in the Minho-SAS scores suggests that the scale demonstrates consistency across different cohorts of students. The observed effect sizes (Cohen's d) of 0.34 to 0.44, representing small to moderate effects, support this finding. Interestingly, there is a higher effect size on the global scale scores than the latent scores.