Driver attention prediction plays a crucial role in the developing intelligent driving and assisted driving systems. However, this task presents several challenges to researchers, including the difficulty of effectively utilizing driving scene information and the lack of driver attention prediction models that can accurately predict driver’s multiple regions of fixation. To address these challenges, this work proposes a novel multi-scale feature fusion network (MSFFDAP) for driver attention prediction. MSFFDAP uses a convolutional neural network to extract multi-scale features, takes into account the strong complementarity between multi-scale features and adopts a two-stage approach to multi-scale fusion to fully exploit the information in the driving scene and more accurately predict the driver’s multiple fixation locations. In the first stage, the network fuses the features of each two adjacent scales separately. In the second stage, a bidirectional densely connected network is employed to fuse the multi-scale features, enabling the extraction of comprehensive contextual information. Furthermore, effective integration of coordinate attention with Conv-LSTM is applied to accurately capture spatialtemporal relationships. Experimental results on two challenging datasets show that the proposed MSFFDAP adequately fuses information from driving scenarios and is able to predict multiple critical areas occurring at the same time more accurately than state-of-the-art methods.