Image dehazing, as a key prerequisite of high-level computer vision tasks, has gained extensive attention in recent years. Traditional model-based methods acquire dehazed images via atmospheric scattering model, which dehaze favorably but often cause artifacts due to the error of parameter estimation. By contrast, recent model-free methods directly restore dehazed images by building an end-to-end network, which achieve better color fidelity. To combine the merits of these two methods, we propose a physical-model guided self-distillation network for single image dehazing. Specially, we add three early-exit branches on a deep feature extraction network constructed with four attention guided feature extraction blocks to obtain both of model-based dehazed images and model-free dehazed images. Moreover, we propose a two-stage training optimization strategy to combine the features of these intermediate dehazed images, and adopt self-distillation to transfer the features from the deeper layers (perform as teacher) to shallow early-exit branches (perform as student). Experimental results on both synthetic and real-world images demonstrate that the proposed method achieves better dehazing effect when compared with state-of-the-art methods.