The human eye is the main channel through which people obtain external information. Research on the visual information obtained by the human eye adopts eye-tracking technology, which has good application prospects in the fields of psychology and human-computer interaction[1–2]. At present, eye-tracking systems are roughly divided into near-eye eye-tracking systems and desktop eye-tracking systems based on hardware types [3–6].
The principle of the desktop eye-tracking system is to first detect the face of the tester, then locate and segment the eye area in the face, and then locate the pupil center and gaze point calibration, recording long-distance eye movements. The near-eye eye-tracking system generally has two types: helmet-mounted and glasses-type. Compared with desktop eye-tracking systems, near-eye eye-tracking systems are mostly used in wearable devices. Only the eyelid area was recorded, which is convenient for pupil extraction. However, limited by the narrow near-eye shooting space and wearing weight, the near-eye eye-tracking system has poor processing capability and slower speed. Helmet-mounted eye-tracking system [5], the hardware platform is concentrated on the helmet, and the volume and weight are relatively heavy. It is suitable for research experiments and cannot be applied to daily life; glasses-type eye-tracking system [7], as the name suggests, is to install the camera acquisition module on the spectacle frame, generally under the spectacle frame, which does not affect the wearer's field of view, conforms to the wearer's wearing habits, and ensures wearing comfort and freedom of behavior. The near-eye hardware device used in this article is a spectacle-type eye-tracking system. It can effectively record the eye movements of the tester. The device fully meets the design requirements of the spectacle frame, ensuring the rationality and wearing comfort.
In the eye-tracking system, the pupil positioning algorithm is the key link. Commonly used methods are convolutional neural networks [8–10] and AdaBoost [11]. The AdaBoost eye-detection algorithm based on Haar features is widely used[12–13]. The algorithm uses integral graphs and cascading structures to perform statistical learning on the Haar features of eye samples, locate the human eye area, and improve eye-detection and positioning accuracy. The data-based method does not require high image quality but requires a large number of training samples, the training process is complicated, the positioning accuracy is poor, and it is generally suitable for rough positioning. The knowledge-based method is the traditional image processing method; the result obtained by this method is more accurate, and it has a wide range of applications in precise pupil positioning. Selection and judgment are made through prior knowledge of pupil gray information, edge information, and shape characteristics.
The current common methods of pupil detection include image binarization combined with Hough transform [14], Daugman's circle detection operator[15], pupil detection using circular Hough transform mentioned by R. H. Nugroho [16], and the SET algorithm, which extracts pupil pixels based on the brightness threshold, draws the shape of the threshold area, and compares it with a sine curve[17]. The above Hough transform methods are all applied to the entire eye image, with a large number of calculations and high time complexity, and it is difficult to achieve real-time requirements. Since the camera is located under the eyes, when the position of the human eye's gaze changes, the recorded pupil image presents an elliptical shape similar to a circle, so the positioning accuracy of this type of method is low. Additionally, the eye image contains interference information such as eyebrows, eyelashes, and eyelid edges, which affects the accuracy of pupil detection and pupil information acquisition.
In response to the above problems, this paper relies on the eye tracking system of the glasses-type hardware platform, and adopts the combination of machine learning and traditional image processing methods. It is proposed to identify the eye region by training the model, crop the eye image to remove the redundant region to obtain the eye region, filter out the correct pupil fitting by the traditional image processing algorithm, complete the pupil positioning, and realize the blink detection function combined with the concept of eye aspect ratio. This method shortens the pupil detection time and improves the pupil detection accuracy.