Hybrid Swin Transformer is Gaze Estimation need

doi:10.21203/rs.3.rs-3456062/v1

Download PDF

Research Article

Hybrid Swin Transformer is Gaze Estimation need

https://doi.org/10.21203/rs.3.rs-3456062/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In recent years, models based on the Swin Transformer architecture have significantly improved the performance of many computer vision tasks. However, the performance of the Swin Transformer architecture in gaze estimation tasks remains to be verified. In this paper, in order to introduce Swin Transformer into gaze estimation, Pure Swin Transformer(PST-Net) is proposed by improving Swin-T. Based on this, in order to further improve the performance of PSTNet and reduce the computational complexity, a new lightweight model, Hybrid Swin Transformer(HST-Net), is proposed. To the best of our knowledge, we are the first to apply Swin Transformer in gaze estimation. HST-Net significantly improves performance by adding CNNs before and after the Swin Transformer Block. Experiments show that HST-Net significantly outperforms PST-Net and that the model’s generalization performance is significantly better than previous methods. We further perform experiments to explore the effect of HST-Net based on a range of publicly available datasets and analyze the impact of the depthwise separable convolution after the Swin Transformer Block on the final results. The experiments demonstrate that after pre-training, HST-Net outperforms GazeTRHybrid in all benchmark tests, achieving optimal performance while reducing parameters by 56.4% and FLOPs by 75.4%.

Swin Transformer

Gaze Estimation

Lightweight model

CNN

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Hybrid Swin Transformer is Gaze Estimation need

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1