Self-occlusion and External-occlusion significantly impact the accuracy of monocular 3D dense face alignment, some approaches based on 3D Morphable Models(3DMM) make much progress on it, although 3DMM is a simple and effective facialprior provider, but it can’t solve the problem of robustness caused by the lack of facial occlusion datasets. In this work, wepresent M4Net (Multi-scale features, Multi-head outputs, Mask Augmentation and based on 3D Morphable Model), a 3D denseface alignment method based on 3DMM with multi-scale features and a novel mask-based augmentation strategy, withoutincreasing training cost but significantly improving the model’s robustness on occlude situations. Moreover, the representationof pose, shape, and expression from the encoder has been decoupled, so we can transfer the target expression to the sourceimage by simply swapping the shape and/or expression represent vector between the target and the source. The quantitativeevaluation results demonstrate that our approach achieves SOTA results on AFLW2000-3D(68pts) and AFLW(21pts), and thequalitative results on three different datasets indicating our method has good occlusion robustness.