Thyroid cancer is the most common malignancy in the endocrine system, with a rising incidence worldwide in recent decades. Traditional Central Processing Unit (CPU) and Graphics Processing Unit (GPU) software face limitations in processing speed, efficiency, and power consumption, necessitating more efficient solutions. This study proposes a Field-Programmable Gate Array (FPGA)-accelerated quantized inference method to improve the efficiency and accuracy of thyroid nodule detection. We selected YOLOv4-tiny as the neural network model, incorporating Kmeans + + for optimal anchor box dimensions at the software level and combining 8-bit weight quantization with batch normalization and convolution layer fusion at the hardware level to reduce computational complexity. Additionally, a double buffering mechanism and pipelined design were employed to enhance parallelism and hardware resource utilization. We conducted tests on an internal dataset from a tertiary hospital in China. The experimental results show that our proposed FPGA-accelerated ultrasound thyroid nodule detection system performs excellently in terms of image recognition accuracy, speed, and power consumption. The average accuracy is 81.44% on the Tn3k dataset and 81.20% on the internal test dataset, with each image taking 0.398 seconds to process and consuming 3.119 watts of power. The energy efficiency is 17.6 times that of the Intel Core i5-10200H and 0.98 times that of the GeForce RTX 4090. This study offers a new technological pathway for medical imaging diagnosis, potentially enhancing the speed and accuracy of ultrasound image analysis, thereby improving physician efficiency and diagnostic capabilities.