Receiver function (RF) inversion is a well-established method to quantify a horizontally layered approximation of the S-wave velocity structure beneath a seismic station using deconvolved signals of P to S converted teleseismic earthquake waves. It is well-known that the RF inverse problem is highly non-unique and non-linear, and various tools exist that may overcome this problem. One of the most-commonly used methods is joint inversion with other seismological data that are sensitive to S-wave velocity, such as surface waves or S-waveforms. In this contribution, we present a joint inversion framework along with a Python package that implements the joint inversion of RF and the so-called apparent S-wave velocity (VS,app). We assess its performance and feasibility through several synthetic tests. Our implementation includes a pseudo-initial model estimation using two alternative methods, which helps address the inherent non-uniqueness of the joint inversion of RFs and VS,app. This implementation enhances the resolving power of the joint inversion, enabling estimation of S-wave velocities with resolution approaching that of deep controlled source seismic methods. As an illustration, we showcase a real-data example from a permanent station in the Makran subduction zone southeast of the Iranian Plateau. We compare our joint inversion results with several S-wave velocity models obtained through a deep seismic sounding profile and joint inversion of surface wave dispersion and RFs. This comparison shows that although we note a slightly lower sensitivity of our proposed method at greater depths (beyond 50 km), the method yields much better results for shallow structures compared to joint inversion of RFs and surface waves.