Facial expression recognition has made significant progress. However, it still faces challenges in wild environments due to factors such as occlu-sion and posture changes. To address this challenge, the paper proposes a robust framework for facial expression recognition, Feature Fusion and Feature Decomposition Net. Specifically, for the differences in face region scales, multi-scale feature fusion is used in the feature extraction stage to obtain region features of different scales. The fine-grained module decomposes the features into multiple fine-grained features, and the encoder is used to capture the features with discriminative ability and small differences. To improve feature diversity and reduce redundancy, Diversity Feature Loss is proposed to drive the model to extract features with low correlation and mine richer fine-grained features. The results of extensive experiments on a benchmark dataset for facial expression recognition show that FFDNet achieves superior performance compared to some excellent models, in particular demonstrating significant advantages in the complex occlusion and pose variation situations.