COVID-19 is spreading globally, posing a great risk and challenging to the world. And wearing a mask can reduce the spread of the virus and the speed of virus transmission, which is an effective mean to combat the spread of coronavirus. To limit the spread of the virus, the requirement to wear a mask is now becoming more common in public places around the world. Nowadays mask wearing detection has become an important task for computer vision to help the global community. The mask detection task involves complex and diverse scenarios with problems of missed and false detection caused by face features being obscured, varying target scales, poor detection of small targets, and small differences in features between correctly and incorrectly worn masks. For this reason, we propose a new mask detection model, MFMDet, which uses Recursive Feature Pyramid to process the multi-scale features extracted by the backbone network, increasing the global features and the perceptual field, and enhancing the detector's ability to adjust to different scales. To ensure the accurate extraction of valid information, we introduce modulated deformable RoIpooling into the detection head to make the network better adapt to the deformation of the target and enhance the spatial and task awareness of the detection head. In addition, we use Joint Image Hybrid Augmentation to increase the number of training samples and diversity to enhance the model generalization ability. Experimental results show that our method improved by 2.5 AP over the baseline, comparing to the baseline on the PWMFD dataset and outperforms other existing target detection algorithms. We also conducted experiments on the WMD dataset to further validate the generalization ability and effectiveness of the proposed method.