Conventional methods for assessing fetal well-being often require skilled clinicians and are susceptible to noise interference. Echocardiography, a primary technique for this purpose, is reliable but entails high costs and necessitates specialized equipment and trained personnel, presenting challenges in low- and middle-income countries. Phonocardiography (PCG) has recently emerged as a cost-effective alternative, but its performance and complexity limit its widespread use. In this study, we introduce Fetal Heart Sounds U-NetR (FHSU-NETR), a lightweight, easily deployable deep learning model tailored for the simultaneous extraction of fetal and maternal heart activity from raw PCG signals. Validated with data from 20 normal subjects, including a case of fetal tachycardia arrhythmia, FHSU-NETR demonstrated exceptional performance, accurately identifying 95% of the total $35,960$ fetal heartbeats. This significantly outperformed the only method published on the same dataset, regarded as a benchmark method, which detected only 270 beats. The model exhibited a low mean difference in fetal heart rate estimation (-2.55±10.25 bpm) across the entire dataset relative to the ground-truth fetal ECG, successfully detecting the arrhythmia case. Similarly, FHSU-NETR showed a low mean difference in maternal heart rate estimation (-1.15±5.76 bpm) compared to the ground-truth maternal ECG. The model's exceptional ability to identify arrhythmia cases within the dataset underscores its potential for real-world application and generalization. Leveraging the capabilities of deep learning, our proposed model holds promise to alleviate the reliance on medical experts for the interpretation of extensive PCG recordings, thereby enhancing efficiency in clinical settings.