This paper primarily investigates the issue of identity inconsistency resulting from facial inpainting. In the current facial image inpainting process, most low-frequency information is learned directly from the uncovered areas, while high-frequency information, including texture details, is generated from the dataset. The averageness of low-frequency features and the unreliability of high-frequency features result in identity deviations in the inpainted faces. To address this problem, a two-stage generative model is proposed, which includes structural prior guidance and multidimensional consistency guidance. Firstly, a parsing inpainting module was designed to predict structural information. Subsequently, a multidimensional information guidance module was constructed, which, based on semantic consistency guidance, extracts low-frequency confidence information of the inpainted image to suppress the averageness of facial inpainting. Frequency-domain information is extracted from the inpainted image, combined with a facial recognition network, and the frequency-domain identity feature consistency loss is calculated to aid in the inpainting of high-frequency information, including identity features, thus achieving identity consistency in facial inpainting. Furthermore, to evaluate the consistency of the identity information of inpainted faces and real, identity features generated by the inpainting network are extracted, and the distribution of these features is utilized as a measure of inpainting quality. We conducted experiments on the CelebAMask-HQ dataset, and both qualitative and quantitative studies indicate that our proposed algorithm outperforms state-of-the-art methods.