Enhancing receptive fields using atrous convolution in the domain of semantic segmentation has been proven to be an effective method. However, atrous convolution exists ”grid effect” and does not capture long-range dependencies well, causing problems such as loss of information details and class confusion. To address the above problems, based on two proposed modules: Strip Spatial Attention Module (SSAM) and Adaptive Fusion Module (AFM), we designed a network with an encoder-decoder structure named AEDN. Specifically, we introduced horizontal and vertical strip modules to design a Strip Spatial Attention Module, which did not require association for each pixel, avoided useless coding information, and could capture long-distance dependencies. AFM can perform an adaptive fusion of high-level and low-level features without dimensionality reduction, which helps to recover spatial information of images gradually. The effectiveness of AEDN has been verified experimentally on the PASCAL VOC 2012 and Cityscapes datasets. Specifically , our network achieves 80.21% and 77.47% mIoU on the PASCAL VOC 2012 and Cityscape validation sets, respectively, and can better handle issues such as loss of information details and class confusion.