Infrared acquisition technology is often used for target detection in military fields, since it is not easily disturbed by diverse environmental factors. However, the characteristics of low image contrast, coupled with complex imaging backgrounds, long distance and the lack of visual features, making infrared small targets detection a challenging task. The generic deep neural network-based models are struggled to be applied for infrared small targets, since the increased network layers may lead to the gradually loss of target features and positional information.To address above issues, we propose a mask-guided detection model via the coarse-to-fine candidate selection for infrared small multi-target detection. More specifically, to enhance target features and guide the localization process in the neural network, we propose to utilize the foreground mask generated by referring to the non-local self-correlation property of infrared background and the sparse property of target distribution. The obtained mask is treated as the prior to re-weight the convolutional feature maps.Considering the complex background, the multi-target detection is prone to mis-detections. Therefore, we propose a coarse-to-fine candidate selection method on top of the initial detection results. A shallow network is constructed to extract more nuanced visual features from the candidate positions for a binary classification, in which the false positive candidates are ruled out free from the disruptions of other background features.Moreover, given the lack of multi-target infrared datasets, we propose two synthetic datasets based on the public available and own collected infrared data. Extensive experimental results verify the effectiveness and advantages of our model compared to state-of-the-art methods.