Addressing the issues of incomplete detection of instance targets, the challengesin detecting occluded objects, and incomplete segmentation in complex scenesusing diffusion models for multi-target instance segmentation, this paper introduces Pixel DiffusionSeg, an instance segmentation approach based on multi-leveldeformable attention and diffusion models. This method leverages the benefitsof deformable self-attention, which adaptively adjusts the receptive field size,incorporating it into the proposed framework. Additionally, it aggregates multi-scale features and introduces a pixel decoder to achieve pixel-level predictionresults, thereby preserving the image’s detailed information and enhancing segmentation accuracy. The method treats instance segmentation as a process ofdenoising and remodeling masks using diffusion models, capable of reversingnoisy ground truth masks without being biased by noisy candidate boundingboxes. Comparative experiments on the COCO val2017 dataset demonstrate thatthe enhanced model exhibits significant improvements in terms of the number1of detected objects, object scores, segmentation completeness, and generalization performance. Utilizing a ResNet-50 backbone, the model achieves an AP of37.45% , surpassing the 37.3% AP of DiffusionInst. Our code will be graduallymade available at https://github.com/chenhongqian/Pixel DiffusionSeg.