Pose-invariant facial expression recognition (FER) is a hot yet challenging research topic in computer vision fields, especially with the involvement of different observation angles, which makes the recognition results inconsistent from one view to another. In this work, a deep global multiple-scale and local spatial-channel attention (GM-LSCA) dual-branch network is developed for pose-invariant FER. The designed GM-LSCA network contains four main parts, i.e., the feature extraction module, the global multiple-scale (GM) module, the local spatial-channel attention (LSCA) module and decision-level fusion model. The feature extraction module serves to extract texture information and normalize it to the same size. The GM model can extract deep global features at a granular level, which can integrate deep feature information and enhance the network characterization capability. The LSCA module aims to force the network to extract salient feature from local patches, which can release the sensitivity of self-occlusion and pose-variant to deeper feature maps. Finally, the decision-level method is used to further improve recognition accuracy. Extensive experiments were carried out on four public database and the results demonstrated the validity and feasibility of GMSE-LA network.