On the estimation of hip joint loads through musculoskeletal modeling

Noninvasive estimation of joint loads is still an open challenge in biomechanics. Although musculoskeletal modeling represents a solid resource, multiple improvements are still necessary to obtain accurate predictions of joint loads and to translate such potential into practical utility. The present study, focused on the hip joint, is aimed at reviewing the state-of-the-art literature on the estimation of hip joint reaction forces through musculoskeletal modeling. Our literature inspection, based on well-defined selection criteria, returned seventeen works, which were compared in terms of methods and results. Deviations between predicted and in vivo measured hip joint loads, taken from the OrthoLoad database, were assessed through quantitative deviation indices. Despite the numerous modeling and computational improvements made over the last two decades, predicted hip joint loads still deviate from their experimental counterparts and typically overestimate them. Several critical aspects have emerged that affect muscle force estimation, hence joint loads. Among them, the physical fidelity of the musculoskeletal model, with its parameters and geometry, plays a crucial role. Also, predicted joint loads are markedly affected by the selected muscle recruitment strategy, which reflects the underlying motor control policy. Practical guidelines for researchers interested in noninvasive estimation of hip joint loads are also provided.


Introduction
In a musculoskeletal (MSK) system, joint reaction forces (JRFs) are the internal loads due to the contact actions exchanged by the articulating surfaces during motion (Open-Sim 2021; Vigotsky et al. 2019). Their knowledge is an important aspect in the study of human movement. Indeed, JRFs can provide valuable insights into movement disorders such as cerebral palsy (Steele et al. 2012) or into degenerative articular diseases such as osteoarthritis (Kumar et al. 2013). They are also important to define the loading conditions for preclinical experimental tests on artificial joints (Affatato and Ruggiero 2019;Bergmann et al. 2016;Heller et al. 2005), or they could serve as an input to finite-element models aimed at evaluating wear processes in joint implants (Lin et al. 2018;Mattei et al. 2021;Ruggiero et al. 2018) or at predicting bone adaptations (Geraldes and Phillips 2010).
The actual joint loads are hardly measurable in vivo, and to date, noninvasive estimation of JRFs remains one of the main challenges in biomechanics (Heller et al. 2001;Hug et al. 2015). Over the last two decades, instrumented joint prostheses (Bergmann et al. 1988) have emerged as powerful tools to measure JRFs during daily activities. These special prostheses have been implanted at the knee, hip, shoulder, and vertebral joints, and free public databases [e.g., (Fregly et al. 2012;OrthoLoad 2021)] have been created: they include loading, kinematics and kinetics data registered during the analyzed motor tasks. However, due to ethical and practical concerns, instrumented prostheses have been used in a limited number of individuals and motor tasks. Furthermore, since these measurements reflect only a small cohort of the population, namely pathological subjects who had their natural hips replaced by artificial ones, they may fail to represent typical hip joint loads of unimpaired subjects.
Computational approaches based on MSK computer modeling (Bassani and Galbusera 2018;Delp et al. 2007) represent a promising alternative for noninvasively investigating the relationship between body motion and biomechanical loads [e.g., (DeMers et al. 2014;Imani Nejad et al. 2020;Li 2021;Lin et al. 2018;Marra et al. 2015;Martelli et al. 2011;Stansfield et al. 2003;van Veen et al. 2019;Weinhandl and Bennett 2019;Zhang et al. 2015)]. As clearly outlined in Fregly (2021), the potential of using computational MSK modeling techniques in clinical and non-clinical practices is appealing, but progresses are still needed to gain practical utility: indeed, before adopting such models and procedures, their ability to obtain reliable, replicable, and accurate estimations shall be assessed through appropriate validation against experimental data.
Apparently, the work by Moissenet et al. (Moissenet et al. 2017) is the only systematic review paper on the use of MSK modeling for the estimation of JRFs at the knee and hip joints. In particular, the authors investigated what alterations of a generic MSK model provided more accurate estimations of the JRFs at the two joints. However, only 5 of the included studies (24 in total) concerned the hip joint, while the others focused on the knee joint: this was probably motivated by the greater availability of experimental data from both motion capture and instrumented knee prostheses [e.g., (Fregly et al. 2012;Taylor et al. 2017)].
The present paper is an extensive review of the literature on the use of MSK models for computing JRFs at the hip joint. Our main goal was to assess how well MSK models developed for specific patients with artificial instrumented implants have been able to reproduce the measured hip joint loads, considered as target values. Additionally, we wanted to investigate the effects of specific modeling choices on the prediction of hip JRFs. This should also serve to identify the features that mostly improve JRF estimation and to provide guidelines for researchers. It should be highlighted that the present analysis was specifically focused on dynamic MSK multibody simulations, but also static models, like the recent one by Fischer et al. (2021), could represent a valuable alternative when slow dynamic activities (with limited inertial effects) are analyzed, such as standing and slow walking.
This study consists of three steps: (i) select and classify state-of-the-art research papers based on well-defined criteria, (ii) compare them in terms of methods and results, and (iii) identify potential research gaps and suggest future directions for research in this area. Section 2 reviews the typical computational workflow for estimating JRFs at the hip through MSK modeling. Section 3 presents the criteria used to select papers from the current literature, and how they were classified. Section 4 describes how results from the selected studies have been quantitatively compared. Lastly, Sects. 5 and 6 are devoted to the discussion and conclusions, respectively.
2 Typical computational pipeline to estimate joint loads through MSK modeling Figure 1 shows the typical computational pipeline adopted to estimate JRFs through MSK modeling starting from experimental kinematic (i.e., marker trajectories) and kinetic (i.e., ground reaction forces, GRFs) data (Sylvester et al. 2021). The term forces in JRFs and GRFs refers to both forces and moments.
The first preliminary step consists in scaling a generic MSK model to match the subject's anthropometry: geometric features (bone dimensions, joint distances, muscle attachment points) and inertial properties of body segments (segment mass, center of mass location, and inertia tensor) are typically adapted. The most common scaling procedure is based on surface markers placed on anatomical reference points. Errors due to an incorrect marker placement as well as to soft tissue artifacts may be introduced. Scaling is a very critical step that affects all the model's outputs, as it is Fig. 1 Computational workflow typically adopted to estimate JRFs through MSK modeling frequently discussed in the literature where different scaling laws and procedures are investigated (e.g., Fischer et al. 2021;Kainz et al. 2017;Lund et al. 2015)).
After scaling, an inverse kinematics (IK) analysis is performed to estimate the joint angles that best fit the experimental marker trajectories. Similar to scaling, IK can be affected by marker movement (Fiorentino et al. 2020;Lamberto et al. 2017), the adopted marker sets (e.g., Helen-Hayes and its modifications (Davis et al. 1991;Kadaba et al. 1990), CAST (Cappozzo et al. 1995), LAMB (Rabuffetti and Crenna, 2004), etc.) (Mantovani and Lamontagne, 2017;Stief, 2018) as well as by experimental data preprocessing (Rácz and Kiss, 2021) and model choice (Falisse et al. 2018;Roelker et al. 2017;Wagner et al. 2013). Measured GRFs are then used in an inverse dynamics (ID) algorithm to calculate the net joint forces and torques (generalized forces) required to drive the model toward the desired kinematics. Since such forces also depend on the estimated inertial parameters of the body segments, affected by unavoidable uncertainties, a residual reduction algorithm ("How RRA Works-OpenSim Documentation" 2021) is frequently adopted to solve ID: it applies small modifications to the inertial parameters and joint angle trajectories to improve the model's consistency with the rigid-body dynamic equations. Although the effect of the variation of such inertial parameters is still controversial, it seems that joint torque estimation is more affected by uncertainty in marker placement than in inertial parameters (Camomilla et al. 2017), at least for activities of daily living.
Scaling, IK and ID are based on rigid body mechanics and can be equally applied to humans and robots. Differences arise when mechanical joint actuators are replaced by "special cables" representing the actions of musculotendon actuators on bones. This peculiarity of MSK modeling has a double implication: the characteristics of musculotendon geometry and physiology should be considered, and the problem becomes statically indeterminate due to the intrinsic redundancy of the muscles.
In most MSK approaches, a 3-element Hill-type musculotendon model (Millard et al. 2013;Thelen 2003;Zajac 1989) is implemented (Fig. 2). The total force F T exerted by the musculotendon unit includes both active and passive contributions from the muscle according to the force-length-velocity functions, and it is modulated by the neural activation a where F M 0 is the muscle maximum isometric force, f T is the tendon force-length function, f L and f V are the muscle active force-length and force-velocity functions, f PE is the muscle passive force-length function, l T and l M are the tendon and muscle lengths, respectively ( l T s and l M 0 are their slack length and optimal fiber length), v M is the muscle fiber velocity and v M max its maximum value, and is the pennation angle. The values of such parameters are typically derived from generic cadaver studies and/or MRI image repositories [e.g., (Handsfield et al. 2014;Ward et al. 2009)]. In several cases, F M 0 is adapted to the specific subject according to specific laws, that, for example, take into account the subject's mass and height (Correa and Pandy 2011;Handsfield et al. 2014). In some cases, an elementary model is assumed that simply expresses the musculotendon force as Obviously, musculotendon actuators require a welldefined geometry: origin and insertion points within the skeletal structure, as well as their specific paths. Such geometrical information is crucial to properly define muscle moment arms. Like musculotendon parameters, musculotendon geometry is often extracted from cadaver datasets and/ or medical imaging databases used to define the reference model. Musculotendon geometry can also be adapted to the characteristics of a specific subject by using subject-specific (SS) medical imaging data (Martín-Sosa et al. 2019).
To resolve muscle redundancy, an optimization approach, namely static optimization (SO), is typically used. Its cost function, minimized at each instant of time t i , is expressed as the sum of muscle activations or of muscle stresses (muscle force divided by physiological cross-sectional area) raised to the pth power. A slightly different approach is the so-called (1) Hill-type musculotendon model (Millard et al. 2013) computed muscle control (CMC), which uses SO along with feedforward and feedback control to drive the MSK model toward the experimental kinematics Thelen and Anderson 2006). EMG-informed strategies were also devised to tackle the problem of muscle load sharing (Pizzolato et al. 2015): the recorded EMG signals are input into the MSK model with the goal of informing the model about the SS muscle recruitment strategy. In contrast to SO approaches, dynamic optimization (Anderson and Pandy 2001a) and optimal control (Falisse et al. 2019) can synthesize muscle forces while minimizing a time-dependent performance criterion over the whole activity duration and imposing the MSK system dynamics in the form of differential constraint equations. Regardless of the method used to solve the muscle recruitment problem, once muscle forces are known, JRFs can be derived from the equilibrium equations (the reader is referred to "Appendix" for a more detailed description, with a specific focus on the hip joint). The hip joint reaction forces (HJRFs) are usually represented by the resultant force F h applied at the hip joint center H, and a torque T h equal to the resultant moment about H. They represent a system of forces equivalent to the contact actions at the articular surfaces (normal, due to pressure, and tangential, due to the friction/viscous actions of the synovial fluid) and the ligament actions. Typically, articular friction and ligament actions are neglected (compared to the other forces) so that T h ≈ 0 . However, if one seeks to discern contact and ligament contributions, such elements should be properly modeled.

Selected studies and their classification
As briefly commented in the previous section, numerical prediction of HJRFs, and more in general of joint loads, is affected by several uncertainties arising from experimental data, model choice and scaling, MSK characteristics, etc. In this study, we focused on those models whose accuracy could be quantitatively assessed by comparing their results with experimental data from instrumented prostheses. Thus, relevant literature studies on the estimation of HJRFs were selected according to the following inclusion criteria: I. Research journal publications (conference proceedings were excluded). II. Published in the last two decades (2000-today). III. MSK simulations based on dynamic models, IV.a. Simulation results validated against experimental data from instrumented prostheses, for each patient separately [works that used a "typical patient" were excluded, 1 e.g., Heller et al. (2005) and Wesseling et al. (2015)].
Our inspection of such literature returned a first set of works collected in Table 1 (non-colored rows, eleven works). Then, we enlarged such set by expanding the criterion IV.a as follows: IV.b. Simulation results compared with experimental data from instrumented prostheses but based on kinematic-kinetic data from other subjects (either pathological or healthy).
This additional criterion provided a second set of works, also collected in Table 1 (colored rows, six works). It is worth highlighting that a proper model validation could only be performed for the studies included by the criterion IV.a. The results of the studies included according to the criterion IV.b were still compared with experimental data from instrumented prostheses, although it remains questionable whether this is appropriate for different subjects, particularly healthy ones.
In the whole, the extended selection comprises seventeen papers (Table 1). Those research papers from the same author(s) and based on the same model and/or procedures were unified in the same row if they did not provide any additional information for the purpose of our analysis.
Studies in Table 1 were then classified according to several criteria, which will be detailed in the following sections: The modeling aspects presented in Sects. 3.3.1 through 3.3.5 are then discussed in the corresponding Sects. 5.1 through 5.5.

Experimental datasets for simulation and validation
Estimated HJRFs can be validated against experimental data from instrumented hip prostheses. These prostheses are provided with strain gauges and transmit the acquired data to an external unit through a telemetric system (Bergmann et al. 1988). Thanks to a calibration process, strain measures can ultimately be used to obtain HJRFs. A first set of hip load measures was collected starting from 1997 (Bergmann et al. 2001a): a hip implant of type Hip I (Bergmann et al. 1988) or Hip II (Graichen et al. 1999) ( Table 2) was made with telemetric sensors that could wirelessly relay force data. Such prostheses were implanted in four subjects, and force data, along with kinematics and kinetics, were collected during activities of daily living. More recently, an advanced instrumented hip prosthesis, namely Hip III (Damm et al. 2010), able to measure friction moments in addition to contact forces, was implanted in ten patients and load data were collected (Bergmann et al. 2016). While kinematics and kinetics were made available for all the investigated activities in Bergmann et al. (2001a), this was done only for walking and standing in Bergmann et al. (2016). Table 2 compares the two datasets in terms of the adopted hip implant, the analyzed subjects, the investigated motor tasks, and the collected motion capture data ("MOCAP"). Both datasets are freely available on the OrthoLoad platform (OrthoLoad, 2021).
All the collected studies used the in vivo measured loads from OrthoLoad datasets (Bergmann et al. 2016(Bergmann et al. , 2001a as target values for their estimations (Table 1, "Experimental dataset" column).

Simulation frameworks
Most of the selected works performed MSK modeling on dedicated software (Table 1, "Software" column): seven are based on the open-source framework OpenSim (Delp et al. 2007), while six adopted the commercial software AnyBody  First row: types of instrumented hip prostheses (OrthoLoad 2021). Hip I and Hip II are provided with 3 strain gauges inside the neck to measure the 3 force components acting at the center of the ceramic ball. Hip II has a fourth strain gauge to detect the strain of the stem. Hip III has 6 strain gauges inside the neck and measures the 3 components of the friction moment in addition to the contact forces (Damsgaard et al. 2006). Furthermore, the work in Li (2021) is based on FEBio (Maas et al. 2012), a software tool for nonlinear finite element analysis in biomechanics, while (LaCour et al. 2020) is based on AUTOLEV (Levinson and Kane 1990), an interactive symbolic dynamics program. Finally, a few studies rely on in-house platforms for MSK simulations (Heller et al. 2001;Martelli et al. 2011;Stansfield et al. 2003).

MSK models
For the following sections, the reader is referred to the "MSK model" column in Table 1.

Lower-body versus full-body models
Models including the lower limbs and pelvis only (lowerbody, LB) were the most used within the selected studies (Heller et al. 2001;Li 2021;Martelli et al. 2011;Modenese et al. 2013Modenese et al. , 2011Modenese and Phillips, 2012;Stansfield et al. 2003;Zhang et al. 2015), while some recent works rely on models that include also the trunk and the head (De Pieri et al. 2019Fischer 2018;Hoang et al. 2019LaCour et al. 2020;Lunn et al. 2020;Wesseling et al. 2016). Many studies simplified LB models to have only one leg (LB unilateral). In that case, additional actuators acting on the pelvis were introduced to model the dynamic contributions associated with the missing torso and the contralateral leg. None of the studies has included the arms into the model (no full-body, FB, model was used), and their inertial properties were lumped with those of the torso and the head (if any).

Models of the hip joint
In most cases, the hip joint was modeled as an ideal 3-dof spherical joint without explicitly modeling either articular contact, lubrication, or ligaments. However, three of the selected studies implemented a more complex hip joint architecture, as detailed below.
Zhang et al. (Zhang et al. 2015) assumed a 6-dof hip joint, and HJRFs were estimated adopting a force-dependent-kinematic (FDK) approach (Skipper Andersen et al. 2017). In particular, HJRFs were predicted using a linear force-volume penetration law (similar to the elastic foundation theory) to estimate the contact actions between the compliant surface of the polyethylene acetabular cup and the rigid surface of the femoral head. Moreover, a linear spring (with stiffness 5·10 4 N/m) connecting the centers of the acetabular cup and the femoral head was added to model the overall action of the capsule ligaments (Fig. 3A).
A very recent study (Li 2021) incorporated an MSK model with rigid bones and a compliant hip joint with SS cartilage geometry into a finite-element software to simultaneously solve for rigid body dynamics, muscle forces and HJRFs. Lastly, in LaCour et al. (2020), the three translational dofs of the hip joint were controlled by a contact-detection algorithm based on a spring-damper representation of the acetabular surface. In particular, the contact surfaces of the acetabular cup and the femoral component were represented as a polynomial surface and a point cloud, respectively (Fig. 3B). Furthermore, the primary hip capsular ligaments were modeled as nonlinear springs, with parameters derived from published works.

MSK geometry
Most of the selected works (De Pieri et al. 2019Fischer 2018;Heller et al. 2001;Hoang et al. 2019Li 2021;Lunn et al. 2020;Martelli et al. 2011;Modenese et al. 2013Modenese et al. , 2011Modenese and Phillips 2012;Weinhandl and Bennett, 2019) are based on models with MSK geometry (i.e., bone dimensions and musculotendon geometry) derived from generic cadaver repositories or other available datasets based on medical images. Such model templates have been linearly scaled to the subject's anthropometry by using surface markers and/or osseous landmarks.
The model in Heller et al. (2001) is based on the Visible Human (VH) cadaver dataset ("The National Library of Medicines Visible Human Project" 2022), while most of the collected studies derived musculotendon geometry from the recent cadaver dataset TLEM 2.0 (Carbone et al. 2015) and its precursor TLEM (Klein Horsman et al. 2007). TLEM-based models were implemented both in AnyBody (De Pieri et al. 2019Fischer 2018;Li 2021;Lunn et al. 2020;Zhang et al. 2015) and OpenSim (Modenese et al. 2013(Modenese et al. , 2011Modenese and Phillips 2012). TLEM 2.0 is based on medical imaging data from a single subject, and it provides muscular geometrical information with the highest level of detail currently available. Among the collected studies, SS CT scans were used in Fischer (2018) and Stansfield et al. (2003) to adjust the hip joint geometry and the location of the hip joint centers and in Zhang et al. (2015) and LaCour et al. (2020) to determine bone dimensions. Furthermore, both CT and MRI SS images were used in Wesseling et al. (2016) to define bone geometry and joint positions as well as muscle paths, origins and insertions.

Musculotendon model
In all studies, a 3-element Hill-type musculotendon model was implemented. However, muscle contraction dynamics and the contribution from the parallel passive element were neglected in AnyBody-based studies as well as in a few OpenSim-based studies (Modenese et al. 2013(Modenese et al. , 2011Modenese and Phillips 2012;Weinhandl and Bennett 2019). The influence on HJRFs of the choice of the muscle model, simplified vs. complex, is discussed in Fischer (2018).
The generic l M 0 and l T s parameters were linearly scaled with segment lengths in all studies: in the OpenSim-based studies, the ratio l M 0 ∕l T s of the generic model is preserved in the scaled model, while in AnyBody it is adjusted to maintain the joint angle at which muscle force peaks. F M 0 was left to its generic value in all the studies except for (De Pieri et al. 2018), where it was linearly scaled accounting for the body mass and fat percentage.
To assess inter-subject variability, Hoang et al. (Hoang et al. 2019 investigated the effect of SS, EMG-driven calibration (Pizzolato et al. 2015) of musculotendon parameters by simultaneously minimizing the error on joint torques and the magnitude of the resultant HJRF.

Muscle recruitment strategies
The collected works have adopted two different muscle recruitment strategies, namely SO and EMG-driven (Table 1, "Muscle recruitment" column). However, as evident from Table 1, SO was the most adopted. Many researchers minimized the sum of squared muscle activations (or muscle stresses) (Hoang et al. 2019Li 2021;Martelli et al. 2011;Modenese et al. 2013;Modenese and Phillips 2012;Weinhandl and Bennett 2019;Wesseling et al. 2016;Zhang et al. 2015), while others minimized the sum of cubed muscle activations (or muscle stresses) (De Pieri et al. 2019Fischer 2018;Lunn et al. 2020). In Heller et al. (2001) and Stansfield et al. (2003), minimization of the total muscle force was selected as recruitment strategy (to prevent excessive loading of individual muscles, Heller et al. (2001) included a limitation on the maximum muscle force). The minimax criterion was used in Fischer (2018;Stansfield et al. (2003); Zhang et al. (2015). A variation of the classical squared muscle activation cost function is described in Wesseling et al. (2016), where the magnitude of the resultant HJRF was weighted and concurrently minimized at each instant of time.
Only the works by Hoang et al. (2019) compared the HJRFs estimated through SO with those obtained through EMG-informed approaches.

Quantitative comparison of the results across studies
As also pointed out in Moissenet et al. (2017), a direct comparison of the obtained HJRFs across the selected studies was, unfortunately, limited by the different ways in which results have been presented, as well as by the lack of a common error metric to assess the deviations between in vivo data and simulated results. When available, specific deviation indices were extrapolated from the studies included by criterion IV.a (for which a proper model validation was possible) in order to assess the goodness of fit of the predicted HJRFs with respect to their experimental counterparts. In most cases, the adopted deviation index was obtained, for each patient, as the average across the trials of the mean deviation obtained in each trial. Only in Zhang et al. (2015) the deviations were calculated on the mean trial for each subject. Deviations were computed on the resultant HJRF, except for Heller et al. (2001), where the force components were considered. Specifically, the following deviation indexes were considered: • RMSE (Root mean square error) it is an indicator of the global deviation over the activity cycle and is expressed as a percentage of the body weight (%BW). • RPPD (Relative peak-to-peak deviation) it is an indicator of a local deviation between the experimental and the corresponding simulated peak force values. When two peaks manifest themselves, as typically happens in walking tasks, the worse index was considered. RPPD is expressed as a percentage of the experimental peak value (%EP). • RDEP (Relative deviation at experimental peak) similar to RPPD, it is calculated at the instant of experimental peak value and is expressed as a percentage of such experimental value (%EP) as well. As opposed to RPPD, any potential time shift between simulated and experimental curves is ignored. Figure 4 details the two local deviation indices considering a typical experimental resultant HJRF and the corresponding one simulated for a specific patient (and a specific trial). Table 3 shows such deviation indices for the studies that made them available (i.e., nine out of the eleven studies in non-colored rows from Table 1), and for each analyzed motor task. LaCour et al. (2020) and Modenese et al. (2013) did not report any of the deviation index.
Depending on the results made available by the authors of the selected studies, both RPPD and RDEP in Table 3 were estimated by (i) averaging the (absolute) relative deviations across all subjects (mean), and by (ii) taking the maximum (absolute) relative deviation across all subjects (max), while the reported values for RMSE represent the average deviation across all trials and patients for a specific activity. However, caution should be used when comparing data in Table 3, for two reasons: (i) when averaging RPPD and RDEP across trials and subjects, cancellations due to opposite signs may result if signed deviations are considered (instead of their absolute values), as in Heller et al. (2001) and Zhang et al. (2015); (ii) if a significant time shift between experimental and simulated data is present, the RDEP index does not reflect the actual deviation, and the RPPD index should be considered instead.
To further facilitate the comparison of the results from the studies in Table 3, we plotted together the resultant HJRF predicted by the authors that analyzed the same subjects and compared them with the corresponding in vivo measurements. This was done for patients HSR (sex = male, age = 55, height = 174 cm, weight = 860 N) and KWR (sex = male, Yellow squares indicate the experimental peak value (EP) and the simulated one, while the blue circle indicates the value assumed by the simulated force at the instant of EP age = 61, height = 165 cm, weight = 702 N) from Bergmann et al. (2001a). Both walking and stair climbing were considered. When multiple simulations were performed within a specific study, the scenario that produced the best fit was selected, i.e., p = 2 in Modenese et al. (2011) and the LLLM model in Weinhandl and Bennett (2019). According to the data made available by the authors, mean curves across trials are reported in Fig. 5, except those predicted by Heller et al. (2001) and Stansfield et al. (2003), which are obtained from one trial only (magenta and cyan curves). The deviation indices listed above were also calculated, together with the time shift at first peak and the Pearson's correlation coefficient (R) to assess similarity between curves (Table 4). In particular, RDPP and RDEP were estimated by using the absolute deviation between experimental and simulated force values.

Discussion
Some general considerations regarding the ability of MSK models to reproduce experimental HJRFs can be drawn by inspection of Fig. 5 and Table 4: different models and techniques produce significantly different predictions of HJRFs, both during walking and stair climbing. All studies included in this quantitative analysis are based on LB unilateral models, and, at least for Modenese et al. (2011); Weinhandl and Bennett (2019); Zhang et al. (2015), the same muscle recruitment criterion was adopted. This suggests that differences in HJRF predictions are mainly related to the different MSK characteristics of the models, as will be discussed in the next sections. Deviations from experimental measurements emerge both in the shape and in the magnitude of the predicted forces, as also supported by the values in Table 4. In general, MSK models tend to overestimate the in vivo measurements, especially during the weight acceptance phases of walking and stair climbing. Underestimation is more evident during the swing phase. Although obtained from one trial only, forces predicted by the model in Heller et al. (2001), based on the oldest cadaver template (among the selected studies), follow more closely the experimental reference with respect to the other models: the lowest RMSE, time shift and R are obtained for both patients in walking, while this is the case only for patient HSR in stair climbing ( Table 4). As it can be observed in Fig. 5, the HJRFs predicted for patient KWR during stair climbing are not well correlated to the in vivo ones, even when using more recent models. Regardless of the model, HJRF predictions are more accurate for the HSR patient compared to the KWR patient during stair climbing, while the opposite is true for walking. It is also worth noting that HJRFs predicted in    Table 4 Deviation indices, time shift at 1st peak, and correlation coefficient R calculated for patients HSR and KWR for walking and stair climbing Negative values in the time shift indicate that the peak of simulated force anticipates the experimental one minimization of squared muscle activations). In particular, Modenese et al. (2011) obtained better results: this might be due to other inaccuracies in the modeling procedures (e.g., scaling or operator-dependent errors).
The quantitative analysis carried out above seems to suggest, rather surprisingly, that the model in Heller et al. (2001), although the oldest among the selected ones, produces more accurate predictions of HJRFs both in walking and in stair climbing. Apparently, developing more detailed models could help obtain better predictions, but introducing several parameters with uncertain values may have the opposite effect (Roelker et al. 2017), with larger deviations from the experimental reference.
Identifying the modeling features that best reproduce experimental HJRFs is quite challenging. Indeed, models and approaches adopted in the collected studies differ in many respects, and a one-to-one determination of cause-effect relationships is not trivial at all. However, from a more general perspective, the influence of certain modeling choices can be investigated, as detailed in the sequel.

Lower-body vs full-body models
Investigating the influence of using LB or FB models on HJRF estimation was not straightforward. All studies included in the quantitative analysis of Table 3 are based on LB unilateral models, except for Fischer (2018) andDe Pieri et al. (2018) which used models including also the trunk and the head. In the latter cases, a relatively low RMSE with respect to the other studies was observed, but it is not significant enough to draw definitive conclusions. Also, the studies in Fig. 5 adopted LB unilateral models, thus hindering conclusive statements about the effect of the model structure on HJRF prediction. LB unilateral models are certainly more efficient from a computational standpoint, but their advantages over LB bilateral models is not clear. However, their adoption could be justified in the analysis of symmetric tasks, such as unimpaired walking. When motor tasks demanding the cooperation of both limbs are considered, or when subjects with significant asymmetry in lower limbs are analyzed, LB bilateral models may be preferred.
None of the selected studies used FB models, probably because upper limb kinematics was not registered in any of the experimental datasets. Current literature suggests that upper limb configuration does not have a relevant effect on HJRF prediction in walking at normal speed (Angelini et al. 2018), but additional studies are needed that compare HJRF predicted by LB and FB models. Indeed, the distribution of inertial properties may substantially change when upper limbs are modeled. Moreover, other activities of daily living where arm movement plays an important role should be investigated (e.g., fast walking, running, squat).

Models of the hip joint
More complex models including ligaments and elastic (but frictionless) interfaces at the hip joint (LaCour et al. 2020;Li 2021;Zhang et al. 2015) had limited success in improving the accuracy of the estimated HJRFs if compared to a simple 3-dof hip joint model with rigid contact and no ligaments. In both Zhang et al. (2015) and Li (2021), differences in the peak resultant HJRF predicted by the simple and complex hip models were less than 5% and 1%, respectively, both in walking and in stair climbing. These findings could be explained considering that (i) the resultant HJRF with elastic interfaces is basically the same as with rigid contact surfaces when elastic deformations are small and (ii) ligament forces are quite small compared to muscle forces when activities of daily living with small hip ranges of motion are performed. As a matter of fact, predictions obtained from these complex models are still affected by large deviations, both globally over the whole activity cycle and locally at specific time instants: as shown in Table 3, the RMSE in Zhang et al. (2015) is in the range 42-49% BW for walking, with a max RDEP of 9%, and in the range 38-55% BW for stair climbing, with a max RDEP of 23%, whereas the max RDEP for walking is 25% and 32% in Li (2021) and LaCour et al. (2020), respectively. Such findings are consistent with those from Fig. 5: the forces predicted by Zhang et al. (2015) did not significantly improve on the other models with simpler hip joints.

MSK geometry
MSK characteristics and tissue properties can markedly vary among individuals (Duda et al. 1996;Scheys et al. 2008), and for this reason, the accuracy of scaled generic models has been questioned, particularly when substantial geometric between-limb differences or MSK pathologies are present. SS models allow inclusion of individual MSK anatomy and properties (Akhundov et al. 2022). Model personalization can be performed fundamentally at two different levels: (i) MSK geometry (e.g., bone dimensions and musculotendon paths) and (ii) musculotendon parameters (e.g., the characteristic parameters of the Hill-type model), dealt with in Sect. 5.4.
The prediction of HJRFs revealed high sensitivity to the MSK geometry (Carbone et al. 2012), in particular to the hip joint geometry [hip joint center, neck length, neck shaft angle, femoral anteversion (Heller et al. 2001;Lenaerts et al. 2009Lenaerts et al. , 2008] and to the musculotendon geometry [paths and attachment points of the muscles spanning the hip (Carbone et al. 2012;Martín-Sosa et al. 2019)]. Also, bone dimensions have been shown to impact HJRF prediction (Koller et al. 2021). All these parameters are interconnected, as they affect the lines of action of muscles, hence their moment arms. MSK geometry also substantially varies between the different cadaver templates on which generic models are based and identifying which one is best suited for the prediction of HJRFs is not trivial.
Quantitatively, using the LLLM model decreased the max RPPD from 101% (ALLM) to 44% in slow walking, from 113% (hip2372) to 59% in normal walking, and from 121% (hip2372) to 49% in fast walking. However, the RMSE range was 42-65% BW, while the mean RPPD was around 45% for normal walking even with such detailed model (Table 3). The authors attribute such differences to the significant variability in muscle moment arms across the cadaver templates.
Interestingly, the performance of TLEM-based models seems to be exceeded by that of the model in Heller et al. (2001), which is based on the older VH cadaver template ("The National Library of Medicines Visible Human Project" 2022) (as evident from Fig. 5).
The importance of properly modeling the musculotendon geometry is also highlighted in De Pieri et al. (2018): the refinement of muscle paths and the introduction of wrapping surfaces into the TLEM model reduced the RMSE from 36% BW to 30% BW over the whole gait cycle, with a particular effect at the second peak (where the RPPD decreased from 14 to 10%) and during swing.
The recent study (Martín-Sosa et al. 2019) found that informing the model with muscle attachment points from SS MRI significantly affected the estimation of HJRFs: a difference of 35% BW between the scaled generic model and the SS model was observed. Such findings are consistent with those in Kainz et al. (2021), where the inclusion of personalized geometry (bones and muscle paths from MRI) of impaired subjects had a significant impact. In Wesseling et al. (2016), gradually increasing the level of SS detail, specifically through CT and MR images, improved HJRFs with respect to the scaled generic model, while accounting for the hip capsule geometry through wrapping surfaces further reduced discrepancies at the second peak of HJRFs in walking. Unfortunately, no quantitative analysis was possible since different datasets were used for simulation and comparison in Wesseling et al. (2016).
Among the studies compared in Fig. 5, only Stansfield et al. (2003) and Zhang et al. (2015) used CT images to define hip joint centers and bone dimensions, respectively. Nonetheless, such model choices did not significantly improve the HJRF prediction when compared to the other generic models, probably because of the insufficient level of SS details.

Musculotendon model
Muscle force generation is also affected by the characteristic parameters of the Hill-type musculotendon model, especially l T S , l M 0 , and F M 0 (Carbone et al. 2016;De Groote et al. 2010;Scovil and Ronsky 2006), and the level of sensitivity depends on the role each muscle plays during the analyzed task (Carbone et al. 2016). Currently, musculotendon parameters cannot be directly derived from MRI, and their generic values are typically obtained from repositories based on dissected cadavers (Klein Horsman et al. 2007). Only reference values for F M 0 have been estimated from MRI-segmented muscle volumes or regression equations (Handsfield et al. 2014). Given the considerable uncertainty in musculotendon parameter values, the use of a simplified muscle model, where muscle dynamics is neglected, seems a reasonable choice, at least for low-dynamic motor tasks. In Fischer (2018), the simplified muscle model improved the simulated HJRFs with respect to the 3-element model: at the local level, the mean RDEP decreased from 24 to 8% in walking and from 30 to 12% in standing, while at the global level the RMSE decreased from 54 to 40% BW in walking, and from 74 to 41% BW in standing. These results suggest that adopting erroneous musculotendon parameters may be worse than not estimating them at all. However, ignoring muscle dynamics is not advisable when analyzing more dynamic activities: in Modenese and Phillips (2012) and the deviation of predicted vs. experimental HJRFs increased with walking speed.
Personalization of musculotendon parameters can be achieved through EMG-driven calibration, as done in Hoang et al. (2019. It can potentially help obtain more accurate HJRFs, but future studies are needed that quantitatively assess the benefits of such strategy. When musculotendon parameters were calibrated by also minimizing the peak of the simulated resultant HJRF , the estimated loads were more comparable to their experimental counterparts (as expected). Also, F M 0 can be adapted to the specific subject according to different scaling laws based on segmented muscle volumes from SS MRI (Modenese et al. 2018).
Construction of SS models is quite expensive and timeconsuming, as a large collection of imaging data is necessary. This is probably the reason why generic models are mostly used. Currently, automated workflows for the creation of SS models are being developed, and they may increase the repeatability of SS simulations (Modenese and Kohout 2020;Modenese and Renault 2021). Although current literature findings steer researchers toward using SS models, the level of model personalization should be calibrated depending on the research objectives as well as on the characteristics of the analyzed subjects. The possibility of using "low-cost" generic MSK models should always be considered within the modeling workflow.

Muscle recruitment strategies
Understanding muscle recruitment is another crucial aspect in the estimation of accurate muscle forces. The actual criteria underlying human movement are still an open research topic. Assuming that the central nervous system identifies an optimal pattern of activations by optimizing a certain criterion, all the selected studies resolved muscle recruitment by minimizing pre-selected cost functions (Sect. 2). However, while such cost functions can be easily identified for simple motor tasks (e.g., jumping as high as possible), they become challenging in more complex activities such as walking, where the central nervous system controls the MSK system in a unique way. The studies (Fischer 2018;Modenese et al. 2011;Zhang et al. 2015) found that objective functions including muscle activations (or stresses) raised to a low exponent p provide more accurate HJRFs during walking, stair climbing and one-leg standing. In particular, Modenese et al. (2011) showed that switching from p = 1 to p = 15 resulted in an RMSE range increasing from 25-47% BW to 288-452% BW in walking, and from 22-66% BW to 335-445% BW in stair climbing. This is well depicted in Fig. 7, reproduced from Modenese et al. (2011), where HJRFs are plotted with different, increasing exponents of the cost function. Similar results were found in Fischer (2018) and Zhang et al. (2015), where better predictions were obtained with p = 2 and p = 3, respectively. Minimization of the total muscle force (i.e., p = 1) tends to maximize the contribution of one most effective muscle while minimizing the number of activated muscles. Increasing the exponent p favors a muscle force distribution in which all muscles contribute a little rather than having one (or a few) dominant muscle: in this sense, synergies are encouraged to keep the contributions of each single muscle as small as possible (van Bolhuis and Gielen 1999). Lastly, an infinite value for p is equivalent to the minimax criterion (i.e., minimization of maximum muscle activation) and is often associated with muscle fatigue (Ackermann and van den Bogert 2010;Damsgaard et al. 2006;Rasmussen et al. 2001). In this sense, very large p values allow more synergistic as well as antagonistic activity (i.e., coactivation) which, consequently, leads to the prediction of larger HJRFs, as discussed in Pedersen et al. (1987) and Wesseling et al. (2015). Attempts to reduce the overestimation of experimental HJRFs were made by augmenting the polynomial cost function with a term penalizing the magnitude of the resultant HJRF (Wesseling et al. 2016), but no significant improvement was obtained.
Despite their ease of implementation and computational efficiency, SO-based procedures have been extensively criticized on several aspects (Anderson and Pandy 2001a): (i) the accuracy of the results strongly depends on the accuracy of the experimental data, (ii) muscle dynamics is not fully taken into account, and (iii) the goal of the motor task may not be properly characterized as the performance criterion is not an integral cost. An early comparison of static and dynamic optimization was made by Anderson and Pandy (Anderson and Pandy 2001b): squared muscle activations were minimized in SO, while metabolic energy per unit distance traveled was minimized in the dynamic optimization approach. They found that the muscle activations predicted by SO and dynamic optimization were rather similar for low-dynamics activities such as walking, but the obtained JRFs were not validated against experimental data. Although excluded by the present analysis because based on a typical patient, the work by Wesseling et al. (2015) also investigated the effect of using a dynamic optimization approach [the Physiological Inverse Approach, PIA, by De Groote et al. (2009)], with respect to SO and CMC, for estimating HJRFs. The authors found that both SO and PIA obtained HJRFs closest to the experimental ones, at least for the typical patient. Such findings were motivated considering that CMC (i) allows muscle-generated moments to deviate from the net joint moments obtained by inverse dynamics, and (ii) it accounts for both muscle dynamics and passive forces in a way that induces a larger co-contraction of muscles. However, all three formulations overestimated the experimental HJRFs, probably due to an excessive co-contraction favored by the selected muscle recruitment criterion.
Another drawback of SO approaches is that they assume identical neuromuscular control strategies between individuals and tasks. This assumption, however, might not be appropriate for participants with neurological disorders. EMG-informed approaches allow to partly overcome such limitation as they take into account individual muscle cocontractions. Findings from Kainz et al. (2021), where children affected by cerebral palsy were analyzed, suggest that informing the model with SS neuromotor control has a minor effect on hip and knee JRF estimation when compared to the personalization of MSK geometry. It is rather the combination of both personalization strategies that significantly impacts JRF predictions. However, despite the use of EMGinformed formulations, large deviations from the experimental data are still (qualitatively) observable (Hoang et al. 2019. In case EMG-informed approaches are used, particular attention should be paid on the correct placement of the electrodes in order to minimize crosstalk phenomena. To shed light on the motor control policy encoded in the central nervous system, approaches based on optimal control techniques, which hold a strong predictive potential, have been implemented (Dembia et al. 2020;Falisse et al. 2019;Mombaur and Clever 2017;Nguyen et al. 2019;Tomasi and Artoni 2022). They rigorously treat model dynamics and integral cost functionals, favoring a more principled investigation of the motor control objectives underlying human movement, hence also a deeper understanding of muscle recruitment both in healthy and pathological subjects. However, the application of such strategies on the estimation of HJRFs has not been investigated yet. 5.6 Considerations on the use of experimental load measurements from instrumented implants Criterion IV.b was added to include the studies that can provide further indications on the use of experimental loads from instrumented implants as target trends, also when analyzing different subjects, either unimpaired or with pathologies.
For example, the two papers ( Fig. 8 for the walking activity. The mean curves of the predicted and experimental HJRFs are quite similar, particularly in the stance phase of walking, despite their different ranges of variation (Fig. 8A). However, the two mean curves in Fig. 8-A should be compared with some degree of skepticism, indeed, (i) the two datasets are very different in terms of number of subjects, i.e., only 10 for the OrthoLoad one and 132 for the other one; (ii) the MSK model, although previously validated in De Pieri et al. (2018), estimates HJRFs with an error of about 0.5 BW at the two peaks in walking; (iii) the results are presented in Newtons, thus potentially "masking" the discrepancies. Stratifications by age and functional ability were also reported ( Fig. 8B and C), and lower loads were obtained for aged and limited-function subjects, suggesting they adopt a "cautious" locomotion strategy.
Although the present review focused on those studies where a comparison, at least qualitative, with in vivo HJRFs was possible, it remains questionable if such comparison is meaningful for healthy subjects. As highlighted in , subjects with artificial joints generally have weaker muscles and altered neuromotor control with respect to healthy ones, and this may explain why the HJRFs they estimated for healthy subjects were generally higher than the available experimental measures. Also Wesseling et al. (Wesseling et al. 2016), where CT/ MRI-based models of healthy subjects were used, stress the consequence of neuromuscular impairments on joint loads, suggesting that these may be higher in healthy subjects than in subjects with artificial joints. In both studies, such differences are much more evident at the characteristic second peak in walking (i.e., terminal stance of the gait cycle). Likewise, Martelli et al. (Martelli et al. 2011) motivate the discrepancies between predicted and experimental measurements, especially in the stance-to-swing phase, considering the different health conditions of the analyzed subject and of the implanted subjects from OrthoLoad. In the same study, the influence of the neuromotor control strategy on joint loads was investigated: deviation from a reference neuromotor control resulted to drive the intensity of the internal body forces to higher levels. However, what control policy is physiologically and clinically plausible is still an open research question that should be addressed by the biomechanics community.

Conclusions
The present analysis has revealed that the estimation of accurate hip joint loads is still an open issue in biomechanics. A one-to-one investigation of cause-effect relationships was hindered by the fact that models and procedures used in the selected studies differed in more than one respect. Although several important improvements in musculoskeletal modeling and computational resources have been made over the last two decades, simulated hip joint loads still overestimate their experimental counterparts measured during activities of daily living. While it is difficult to ascertain the reasons of such overestimation, all the following aspects deserve attention: • In vivo measures of joint loads may be affected by experimental errors, e.g., those deriving from the calibration process of the instrumented implants. • Quality of motion capture data is crucial, as accurate marker placement and measurement of GRFs affect all the modeling phases. • When analyzing low-dynamic activities, neglecting muscle dynamics can be a reasonable simplification, and models without upper limbs seem suitable. • More detailed musculoskeletal models do not necessarily yield a more accurate estimation of joint loads: modeling complex hip joints with contact and ligaments seems unnecessary for improving hip joint load estimates. • Musculoskeletal geometry significantly impacts hip joint load prediction. Its adaptation to a specific subject is cohort compared with those measured from the OrthoLoad dataset (Bergmann et al. 2016); solid lines represent mean curves while shaded areas their range of variation. B and C Predicted resultant HJRF across the LLJ patient's cohort stratified by age and by functional ability, respectively more feasible than personalizing musculotendon parameters. However, effort and costs of model personalization should be justified by the purpose of the investigation. • Until the neuromotor control policy(ies) of the central nervous system has not been clearly identified, static optimization approaches are a simple and efficient strategy for solving muscle redundancy. However, cost functions with lower exponents should be used. If reliable EMG signals are available, EMG-informed approaches should also be considered for solving muscle redundancy.
It is worth pointing out that the above guidelines are general in scope, and they can be applied to the estimation of joint loads in other body districts. Future research should be devoted to developing automated, standardized, and accessible tools for model personalization, specifically in terms of musculoskeletal geometry. Also, additional studies are needed for the identification of motor control objectives in human movement that allow to obtain more physiologically plausible muscle forces, hence joint loads. It is also advisable that researchers share a common error metric to facilitate comparison between various joint load predictions.
In conclusion, we hope that the body of knowledge reviewed in this work can constitute a resource for biomechanists dealing with in silico estimation of joint loads.

Appendix
The present Appendix details the process to obtain the hip joint reaction forces (HJRFs). At first, Fig. 9A graphically clarifies the contributors to hip joint reaction forces, i.e., the contact actions at the articular surfaces and the ligament actions, which are equivalent to system in Fig. 9B represented by the resultant force F h applied at the hip joint center H, and a torque T h equal to the resultant moment about H. As mentioned in Sect. 2, it is commonly assumed that T h ≈ 0.
The equivalent system is considered in the following equilibrium of the limb, where the symbols F and T denote forces and torques, respectively. Figure 10 shows the system made of thigh, shank and foot and its corresponding free-body diagram. External GRFs ( F g and T g ) are measured experimentally through a force plate; the total weight P and the inertial actions F i and T i (reduced to the center of mass G) are known given system's inertial properties (mass, center of mass position, inertia matrix), geometry, and kinematics of each body segment; the forces F m i exerted by the N muscles crossing the hip joint are to be estimated. The interest is in obtaining the HJRFs F h and T h ; hence, a method to obtain the unknown F m i is necessary.
The typical approach to estimate F m i is based on a recursive process starting from the most distal body (i.e., the foot, for which F g and T g are known) and proceeding proximally toward the body of interest (i.e., the femur). It includes two stages: (i) inverse dynamics, which requires the model kinematics, inertial properties and applied external actions, and (ii) an optimization-based strategy to solve for muscle forces. This two-phase process is schematically shown in Fig. 11 for the femur body. At the inverse dynamics level (Fig. 11A), muscles are replaced by ideal joint torque actuators, and net joint forces and torques (denoted by a tilde) are obtained: these include contributions from the muscles crossing the joint and from all other unmodeled elements such as articular contact and ligaments. In the optimization phase (Fig. 11B), muscles are introduced, and their actions are estimated by optimizing a certain performance criterion (e.g., through static optimization) that redistributes T h across the muscles spanning the joint and acting on the femur body. It is worth highlighting that the two systems in Fig. 11A and B are dynamically equivalent. Once muscle forces are known, F h and T h can be obtained by solving the Newton-Euler equations where moments are calculated with respect to the center of the femoral head H: HP i , HG f , HK are the position vectors pointing from H to the points of application of muscle forces ( P i ), to the center of mass ( G f ), and to the knee joint center (K), respectively.
It is worth noting that inverse approaches on which estimation of joint reaction forces is based are intrinsically Fig. 9 A System of forces acting at the hip joint exerted by ligaments (green arrows) and by articular contact with the acetabular cup (orange arrows: normal contact forces, red arrows: tangential contact forces). B Corresponding equivalent system about H affected by cumulative errors that increase toward proximal joints. Thus, estimation of joint loads at the hip is affected by a larger error than the estimation of joint loads at the ankle. Fig. 10 A System made of thigh, shank, and foot. Representative muscles are shown in red through their lines of action. B Free-body diagram of the system (G is its center of mass). Muscles actuating the hip joint have been replaced by their corresponding forces F m i Fig. 11 Free-body diagrams of the femur ( G f is its center of mass) for inverse dynamics (A) and optimization (B)