A novel semantic segmentation model is proposed to improve segmentation accuracy for small and obstructed image targets. This multi-branch structure increased gradient flow paths and prevented vanishing gradients. The architecture was based on a DeepLabv3 + framework and utilized grouped convolutions in each bottleneck of the backbone network (ResNet50), to reduce both the number of model parameters and the model size. The network was also re-parameterized to increase speed during the inference period, which had little effect on the segmentation results. A multi-scale hierarchical attention module (MHAM) was applied in parallel with atrous spatial pyramid pooling (ASPP) in the encoder, to fuse feature information output from the two modules and achieve adaptive segmentation of multi-scale targets. Transfer learning and data augmentation were also used to accelerate convergence and further improve model robustness. The proposed network was evaluated using an aerial semantic segmentation benchmark (AeroScapes), to assess segmentation performance for objects at different scales. The mean intersection over union (mIoU), calculated for the validation set, improved by 43.12% and 47.51% compared with the DeepLabv3+ (Xception65) network and the DeepLabv3+ (ResNet50), respectively. In addition, the proposed network achieved a higher mIoU (84.98%) and a higher mean pixel accuracy (mPA, 97.57%) than six other advanced semantic segmentation networks (U-Net, RefineNet, PSPNet, DADA, DSRL, and HANet), as well as higher mIoUs than comparable algorithms for the CityScapes (84.96%) and ADE20K (52.72%) datasets. It was also found that doubling the number of channels after the grouping convolutions did not significantly change the number of model parameters or the model size. However, the acquired features were more detailed and the images were more complete, indicating the proposed model achieved better segmentation accuracy for small and occluded targets.