To address the issues of low detection accuracy and high computational complexity caused by complex image backgrounds and small target sizes in the detection of indicator lights in data center equipment rooms, a lightweight small target detection algorithm based on Transformer, named DCEI-RTDETR, is proposed. First, the EfficientFormerV2 backbone network is used as the feature extraction network. By reducing the number of downsampling times in the algorithm, the size of the feature maps to be detected is increased, allowing the network model to focus more on small targets. Subsequently, a High-Level Screening Feature Aggregation Network (HS-FAN) architecture is designed as a hybrid encoder. The HiLo attention mechanism is used as an intra-feature interaction module for high-level feature scales. The GSConv and VoVGCSCSP modules are employed for cross-scale feature fusion, adaptively generating output weights for each level of feature maps and dynamically optimizing the feature map representation capability. Additionally, a one-to-many label assignment method is used to introduce a grouped decoder to optimize object query processing, alleviating the issues of occlusion and loss of small target feature information. Finally, the GIOU loss function is replaced with Inner-EIOU, using a scaling factor to control the auxiliary bounding box, thereby improving the accuracy of small target detection. Experimental results on a proprietary data center equipment status detection dataset show that compared to the original RT-DETR algorithm, mAP50 is increased by 4.2%, mAP50:95 is increased by 2.1%, and FPS is 90. Generalization experiments on the public VisDrone2021 dataset also demonstrate the effectiveness and generality of the proposed algorithm, with mAP50 improved by 4.4%.