To mitigate the parameter increase from integrating infrared data and enhance detection accuracy, this work introduces an advanced fusion framework for visible and infrared thermal imaging. It presents the refined C2fv1k9 module and the MA4CBCA feature enhancement module, leveraging a hybrid attention mechanism. Additionally, the CorAF2n1 feature fusion module, utilizing an attention mechanism, and enhancements to the CIOU loss function's penalty term are proposed. This culminates in conducting experiments and demonstrations using the model yolov8n_f4s2c_m4ca2n1_cdiou5_cdiou5. Relative to the previously examined yolov8n_f4_scaff2_adf model, this model's accuracy improved to 0.924 from 0.885, recall rate to 0.916 from 0.876, and mAP@50–95 significantly increased to 0.728 from 0.711. These enhancements not only underscore the model's superiority in accuracy and reliability but also demonstrate its capacity for delivering exceptional detection performance with minimal computational resources.