2.1. Participants
A total of 366 male (age: 22.2±4.9 years, height: 183.6±8.5 cm, mass: 80.8±11.9 kg) and 183 female (age: 23.0±4.7 years, height: 171.3±6.6 cm, mass: 66.2±10.4 kg) athletes gave their written informed consent to voluntarily join the study. These included 135 athletes 6 to 24 months post ACL reconstruction, 63 male (age: 24.5±6.5 years, height: 184.9±8.0 cm, mass: 84.4±12.7 kg) and 72 female (age: 23.9±5.1 years, height: 171.9±6.5 cm, mass: 69.4±12.3 kg). At the time of testing, all participants executed regular training for fitness or in different sports up to a professional level (35% track and field, 16% soccer, 11% handball, 10% basketball, and 28% other sports). Exclusion criteria encompassed acute pain of the knee joint and/or thigh muscle as well as injuries (hamstring strain injury within the last two years, contralateral or recurrent ACL injury). During the testing period, all participants maintained their normal physical activity level except for resistance training.
2.2. Instruments
The isokinetic dynamometer IsoMed 2000 (D&R Ferstl GmbH, Hemau, Germany) was used for all tests. A double shin pad for unilateral knee flexion and extension was attached to the dynamometer axis. The shin pad’s distal part was fixed by a strap ~2-3 cm proximal to the medial malleolus of the participants. A calibration of the device was performed before and after each testing session.
2.3. Procedures
The unilateral knee tests of the left and the right leg followed a protocol with proven reliability [7,25] in a single testing session. It started with a randomised test condition followed by a standardised testing order (e.g. left extensor, right flexor, left flexor and right extensor). Each participant completed the protocol in a separate familiarisation session preceding the testing session by 48-168 hours.[7] After determining their body mass, the participants underwent a 10-minute warm-up period (relaxed jogging and dynamic stretching of the lower extremity muscles). As recommended, discrete movements in a single direction (uni-directional) were executed throughout the largest possible range of motion (ROM) to maximise the duration of voluntary activation [4,26]. For knee flexion (0-110° ROMknee), participants laid prone (extended hip) by pulling their trunk with the hands to the lounger.[27] Knee extensions (90-0° ROMknee) were performed in supine position (extended hip) with handgrips providing sufficient stability.[28] To minimize accelerative inaccuracies the subjects were asked to take off their shoes.[7,25] The dynamometer axis was aligned with the participants’ lateral femoral epicondyle with the assistance of a laser pointer in a pre-activated muscular state at 0° knee flexion.[29] After a static gravity correction measurement, the participants performed six submaximal (~50-80%) concentric (con) and eccentric (ecc) repetitions of the respective muscle group. The return into starting position occurred passively at 120°/s. Each set consisted of five repetitions (two ~75%, three 100%). The last three repetitions were selected for further analysis. A 1-min-interset rest ensured sufficient recovery. For both muscle groups, concentric movements were executed at 30°/s prior to eccentric ones followed by those at 150°/s.[7,25,30] Strong verbal encouragement was provided throughout to facilitate maximum effort by the participants.
2.4. Data processing
Raw data (200 Hz) were recorded by the manufacturer’s software (IsoMed analyze V.2.0) and stored as ASCII files. A custom-made software (C++) isolated the isokinetic ROM (±1% deviation of angular velocity) and filtered the data (5th order Butterworth low-pass filter, 6Hz cut-off frequency). For each testing condition, the trial with the highest gravity-corrected peak moment (PM) and contractional work (CW) was selected. Conventional (PMHcon/PMQcon) and functional (PMHecc/PMQcon) hamstring-quadriceps ratios (H:Q ratios) as well as lateral differences were calculated for each angular velocity. The dynamic control ratio at the equilibrium point (DCRe) with the highest moment out of nine intersection points (each combination of the three flexor and extensor movements) was identified.[25] Normalisation to body mass enabled inter-individual comparison.
2.5. Statistical analysis
2.5.1 The Random Forest
For the ML approach, the Random Forest has been chosen as it can deal with collinearity as well as non-linear relationships between features and outcome. The collinearity had to be considered as multiple measurements of the same muscle group are performed. Consequently, high correlation between some features are expected and plausible. The features also include ratios, which inherently tend to have a sweet spot so not assuming a purely linear relationship to the outcome within the model is preferable.
The Random Forest is based on multiple decision trees, which are evaluated independently of each other. The relative frequency of trees, which predict a specific outcome can be interpreted as the estimated likelihood for that outcome. The final binary classification is done due to a threshold for that likelihood. Each tree is computed using the CART (classification and regression tree) algorithm on a bootstrap sample of the data. In each step, it iteratively searches an optimal cut in a subset of the features and divides the observations accordingly. The optimal cut maximizes the reduction of Gini-Impurity. This process is repeated for the resulting subgroups of observations until the two classes are perfectly separated. This leads to a tree-like structure of binary decisions. Due to the bootstrap samples of the data for each tree and the random selection of features at each step, each tree is different. It is the main motivation for the Random Forest as it encompasses a multitude of relationships and patterns within the data.[31]
In the present study a total of 62 features were available (see appendix). These included the PM and CW for each leg as well as the lateral differences for each test condition. In addition, the conventional and functional H:Q ratios and the DCRe were calculated. For incorporating single leg features in the model the self-proclaimed dominant leg in the control group and the unaffected leg in the ACL group were categorized as the advantageous (adv) leg. Respectively, the non-dominant leg and the injured leg were categorized as the disadvantageous (dis) leg. For the computation of the Random Forest R and the R package “random Forest” were used and Random Forest models with 200 trees were created.
2.5.2 Assessing predictive performance
The predictive performance of the model is evaluated with a 10-fold cross validation.[32] To account for the unbalanced data the cross validation was stratified so each fold contains a similar amount of injured subjects. The threshold in the likelihood for classification was optimized due to the Youden-Index.[33] For the performance assessment the measures’ sensitivity and specificity as well as the Receiver Operating Characteristic (ROC)-curve and corresponding area under the curve (AUC) were used.[34] For its general interpretation, we referred to the rule of thumb introduced by Hosmer et al.[35]: AUC=0.5 no discrimination, 0.5<AUC<0.7 poor, 0.7≤AUC<0.8 acceptable, 0.8≤AUC<0.9 excellent and 0.9≤AUC outstanding.
As leg symmetry >90% in isokinetic PM has been shown to reduce re-injury risk,[10] we use it as a reference. For the comparison, the association between the lateral difference in concentric quadriceps strengths, measured as the difference in PM at 30°/s, and the injury status was analysed with the aforementioned performance measures.
2.5.3 The ALE plot
For further investigation of the model and its features the Accumulated Local Effects (ALE) plot is used. The ALE plot describes how features influence the prediction on average. The main idea is to observe changes in prediction when altering a feature. The ALE plot for a single feature is derived from dividing the values of the feature into intervals. For each observation within the respective interval the predicted likelihood is calculated from using the upper bound of the interval as well as the lower bound of the interval as the value for the feature. Then the differences of these likelihoods are averaged across all observations within the interval to compute the slope for a straight line in this interval, the local effect. These straight lines are later accumulated and centred for visualization. The division into small windows prevents the creation of unrealistic combinations of values in the process. The range and direction of the ALE plot can be used to quantify a features influence on the prediction of an ML model.[36] For the calculation the R package “ALEPlot” was utilized and 30 was chosen for the number of intervals.