Algorithm for Automatic Analysis of Smooth Pursuit Eye Movements Using a Combination of Video-oculography and Deep Learning–Based Object Detection

In the evaluation of smooth pursuit eye movements (SPEMs), recording the stimulus onset time is mandatory. In the laboratory, the stimulus onset time is recorded by electrical signal or programming, and video-oculography (VOG) and the visual stimulus are synchronized. Nevertheless, because the examiner must manually move the xation target, recording the stimulus onset time is challenging in daily clinical practice. Thus, this study aimed to develop an algorithm for evaluating SPEMs while testing the nine-direction eye movements without recording the stimulus onset time using VOG and deep learning–based object detection (single-shot multibox detector), which can predict the location and types of objects in a single image. The algorithm of peak tting–based detection correctly classied the directions of target orientation and calculated the latencies and gains within the normal range while testing the nine-direction eye movements in healthy individuals. These ndings suggest that the algorithm of peak tting–based detection has sucient accuracy for the automatic evaluation of SPEM in clinical settings.


Introduction
Eye movements include the ability to xate and track visual stimuli. In most ophthalmology clinics, the examiner evaluates smooth pursuit eye movements (SPEMs) by subjectively noting their accuracy in relation to a target that is being moved manually to nine directions by the examiner. At the same time, the patient follows it with his or her eyes. [1][2][3][4][5][6][7][8] Nevertheless, because the laboratory methods for quantifying eye movements are constrained by the presentation of the target, no objective evaluation method has been established for use in daily clinical practice. [9][10][11] To achieve a high accuracy in eye movement testing, it is necessary to present a predetermined target on the liquid crystal monitor according to the programming codes. Nevertheless, presenting targets according to the prescribed protocol is di cult in the clinical setting, because the examiner must modify the movement of the target as appropriate to examine the suspected abnormality. [12][13][14][15][16] In their approach to obtaining an accuracy of eye movement measurement in the clinical eld that is close to the laboratory level, Hirota et al. 17 reported that a single-shot multibox detector (SSD), 18 which is an algorithm for deep learning-based object detection, achieved high accuracy in recognizing the target that was moved manually, and the target location was signi cantly and highly positively correlated with the positions of both eyes, as recorded by the video-oculography (VOG; the VOG-SSD system). Moreover, the processing speed was faster in the algorithm for deep learning-based object detection than in the simpler conventional algorithms that use raster scans per image. 19 Although a previous study reported that the target location and both eye positions could be recorded simultaneously in real space, this has not been implemented within an automatic eye movement analysis, such as latency and gain.
Recording the stimulus onset time is mandatory when evaluating eye movements. In the laboratory, the stimulus onset time is recorded by the electrical signal, and the VOG and the visual stimulus are synchronized. Nevertheless, because the examiner must manually move the xation target, recording the stimulus onset time is challenging in daily clinical practice. Thus, this study aimed to develop an algorithm for evaluating SPEM in nine-gaze direction testing using the VOG-SSD system without recording the stimulus onset time. Chicago, IL, USA), heterophoria by the alternating cover test at near (33 cm) and at distance (5.0 m) assessments, and fundus examinations. Stereoacuity was converted into the logarithm of the arc second (log arcsec). Table 1 presents the characteristics of the subjects. The mean ± standard deviation of the refractive errors (spherical equivalents) of the dominant eye was −3.23 ± 3.00 D and that of the nondominant eye was −3.08 ± 2.80 D. The best-corrected visual acuity was 0.0 logMAR units or better in all subjects. The average heterophoria was −6.3 ± 5.9 prism diopters (PDs) at distance and −10.9 ± 8.8 PDs at near. All healthy volunteers had a stereoacuity of 1.62 ± 0.05 log arcsec (range, 40-60 s).
After we explained the nature of the study and possible complications to the subjects, all subjects provided informed consent. This investigation adhered to the World Medical Association Declaration of Helsinki tenets. The Institutional Review Board of Teikyo University approved the experimental protocol and consent procedures (approval No. 19-224-2).

Apparatus
In this study, we used the VOG-SSD system developed by Hirota et al.. 17 We recorded eye movements while tracking the target using a VOG (EMR-9, NAC Image Technology Inc., Tokyo, Japan). The VOG device determined the eye positions by detecting the corneal re ex and pupil center that were created by the re ection of near-infrared light with a sampling rate of 240 Hz. The measurement error (interquartile range) was 0.2°-0.5° at a distance of 1.0 m. The scene camera recorded the real scenes (resolution, 640 × 480 pixels; angle of view, ±31° from the center of the scene camera) with a sampling rate of 29.97 Hz. The gaze positions were merged with the real scenes at a delay of ≤52 ms.
Before performing the eye movement test, all subjects underwent a calibration test to adjust the position of their gaze on the images of the scene camera and under binocular conditions with fully corrected glasses. All subjects were asked to xate nine red cross targets (visual angle, 0.1°) on a white calibration plate during calibration. From one to nine, the nine red crosses of the targets were set at the following The object detection algorithm was used for the SSD model that was the same as in Hirota et al. 17,18 , which detected the target of rabbit-like character with a 75% average precision of 99.7% ± 0.6%. We used Python 3.8.5 for Windows 10 (Microsoft, Redmond, WA, USA) with the following libraries:

Nine-direction eye movements testing
The target was a rabbit-like character that had already been learned to the SSD in Hirota et al.. 17 The target size was 10 × 10 cm, which subtended a visual angle of 5.7° at 1.0 m. The target was manually moved to nine directions (center, left, right, upper left, upper right, lower left, lower right, upper, and lower) within ±15° randomly by an examiner.
All subjects were seated in a well-lit room (600 lx) wearing fully corrective spectacles. Each subject's head was stabilized with a chin rest and forehead rest. During the eye movement test, the subjects were asked to xate on the nose of the target, the visual angle of which was 0.1° at 1.0 m.

Filtering for both eye positions
We excluded VOG data when the change in pupil diameter was >2 mm/frame due to blinking. 20 We replaced the percentage of missing values (0.4% ± 0.7% for all subjects) with a linearly interpolated value calculated from an algorithm written with Python 3.8.5. The horizontal and vertical eye movements were analyzed, and the SPEM and saccadic eye movements were identi ed using a velocity-threshold identi cation (I-VT) lter. 21 The I-VT lter was used to classify eye movements on the basis of the velocity of the directional shifts of the eye. A saccadic eye movement was de ned as the median velocity of three consecutive windows >100°/s. Then, the eye position data at 240 Hz were synchronized with the target data at 29.97 Hz.

Experiment 1
Eye movement testing involves moving the target in eight directions: left, right, upper left, upper right, lower left, lower right, upper, and lower. There is a need for an algorithm that can identify the direction in which the examiner moves the target manually in the clinic without the input of a trigger. In experiment 1, we compared the accuracy of the classi cation in each direction of target presentation between the peak tting-based detection algorithm and the conventional threshold-based detection algorithm.

Procedures
In clinical practice, the origin of the scene camera (horizontal of 0.0°, vertical of 0.0°) and the position where the target is initially presented by the examiner do not necessarily coincide (Fig. 1A, B). The median of the target location of the target was calculated both horizontally and vertically, respectively, and was de ned as the relative origin. The target location and both eye positions were corrected for the difference from the relative origin (Fig. 1C).
The target location calculated using the SSD was identi ed more than 99% of the time and was more stable than eye positions affected by blinks and tears. Thus, each direction was identi ed using the location of the target as a cue.
2.2 Algorithm of automatic detection for testing the directions of eye movements

Peak tting-based detection
The target location was converted to the position vector, and then, the maximum and minimum peaks were detected for 3.0 s ( Fig. 2A, B). We separated the data between the two minimum peaks, including one maximum peak. The separated data were decomposed into horizontal and vertical components from the position vector (Fig. 2C, D). After excluding 1 s from both ends of the separated data, the medians of the horizontal and vertical target locations were calculated (Fig. 2E, F).
The eight median horizontal and vertical locations were ranked from maximum to minimum at left, right, upper, and lower, and then the top three values in four directions were grouped (Fig. 3A). The upper left, upper right, lower left, and lower right were identi ed by combining the horizontal and vertical directions (Fig. 3B). The remaining data in each group were the left, right, upper, and lower.

Threshold-based detection
Threshold-based detection is a simple approach for identifying the category. In this study, the target data

Statistical analysis
We evaluated the accuracy of the classi cation in each direction between the peak tting-based and threshold-based detection using Fisher's exact test.
SPSS version 26 (IBM Corp., Armonk, NY, USA) was used to determine the signi cance of the differences, and a P value of <0.05 was considered to be statistically signi cant.
The nding of experiment 1 suggested that the algorithm of the peak tting-based detection was suitable for evaluating eye movement testing.

Experiment 2
In experiment 2, we investigated the algorithm for the automatic calculation of latency and gain, which are evaluation indices of the eye movements using the data obtained by the peak tting-based detection algorithm.

Calculating for latency and gain
All directions of the horizontal and vertical target location and both eye positions were converted to the position vector. The raw data were tted with a cubic function and were detected at each peak time (Fig.  4A, B). Then, each peak time was applied to the raw data (Fig. 4C). The latencies of both eyes were de ned as the difference between the peak time in both eyes and that in the target location.
The target location and both eye positions at the peak time were de ned as maximum values. We explored the 25th and 75th percentile points of the maximum values in the centrifugal direction (Fig. 5).
We then created a linear regression line using the target location and both eye positions between the 25th and 75th percentile points of the maximum values. The gains of both eyes were de ned as the ratio of the slope of the regression line in both eyes to the slope of the regression line in the target between the 25th and 75th percentile points.

Statistical analysis
We determined the differences in the latencies and gains within both eyes in each direction using the Schéffe test. We calculated the differences in the latencies and gains between both eyes in each direction using the Wilcoxon signed-rank test. The Bonferroni method was used to adjust the P values.
To determine the signi cance of the differences, we used SPSS version 26 (IBM Corp., Armonk, NY, USA), and a P value of <0.05 was considered to be statistically signi cant.

Results
The latencies in all directions were not signi cantly different within both eyes (left eye, P > 0.150; right eye, P > 0.68; Fig. 6A, B; Table 3). The latencies in all directions were not signi cantly different between left (138.04 ± 89.36 ms in all directions) and right (144.75 ± 97.78 ms in all directions) eyes (P > 0.552; Fig. 6C; Table 3).
The gains in all directions were not signi cantly different within both eyes (left eye, P > 0.75; right eye, P > 0.50; Fig. 7A, B; Table 3). The gains in all directions were not signi cantly different between left (0.943 ± 0.149 in all directions) and right (0.935 ± 0.133 in all directions) eyes (P > 0.99; Fig. 7C; Table 4).
The ndings of experiment 2 suggest that using the algorithm of peak tting-based detection, the eye movements can be evaluated from the data with the identi ed target direction.

Discussion
Recording the stimulus onset time is mandatory when evaluating eye movements. Nevertheless, because the examiner must move the xation target manually, recording the stimulus onset time is challenging in daily clinical practice. In this study, we developed an algorithm of peak tting-based detection to evaluate SPEM in nine-direction testing without recording the stimulus onset time. We found that the present algorithm had high accuracy for identifying the directions of target orientation in nine-direction testing.
The peak tting-based detection algorithm correctly classi ed all directions of target orientation. We consider that the present algorithm is advantageous for identifying the oblique directions because it calculates the top three in the left, right, upper, and lower directions and detects the oblique direction by combining the two directions, resulting in high accuracy. By contrast, the classi cation accuracy of the threshold-based detection algorithm was about half that of the peak tting-based detection algorithm. Additionally, there are no standard criteria for determining the threshold, although in this study, the threshold was determined from the standard deviation of ve subjects. Since the standard deviation contains 68% of the total data, the decrease of the classi cation accuracy of the threshold-based detection algorithm to chance level indicates that the in uence of oblique direction is quite signi cant in the automatic determination of nine-direction testing.
The latencies (mean latencies of left and right eyes, 138.04 and 144.75 ms, respectively) and gains (mean gains of left and right eyes, 0.943 and 0.935, respectively), which we calculated from the data of the identi ed target direction using the algorithm of peak tting-based detection, were similar to those reported in earlier studies: the latency of the SPEM was between 50 and 300 ms, 22-24 the gain of SPEM was greater than 0.90 under a velocity of 10°/s, and the moving distance was 15° in healthy individuals. 25 These results suggest that the automatic method for calculating latency and gain which we implemented in this study may has a slight error compared to the manual analysis because of the tting with cubic functions to calculate the vertices, however, the error is acceptable. This study has a limitation. The method for identifying the top three combinations in the horizontal and vertical directions had high classi cation accuracy. However, when the same direction is examined twice, the accuracy might decrease due to the over ow in one direction. Thus, to overcome these limitations, we plan to use this algorithm to accumulate data and use machine learning to identify the target direction in future work.

Conclusion
The algorithm of peak tting-based detection correctly classi ed the directions of the target orientation and calculated the latencies and gains within the normal range during nine-direction eye movement testing in healthy individuals. These ndings suggest that the peak tting-based detection algorithm has an accuracy that is su cient for the automatic evaluation of SPEM in clinical settings.

Data Availability
The data that support the ndings of this study are openly available in Notion at https://www.notion.so/Algorithm-for-Automatic-Analysis-of-Smooth-Pursuit-Eye-Movements-Using-a-Combination-of-Video-oculog-af5cafa832f04c7297022795193cbd48#5ead35fc5352448098afb470aff41c38 S, subject; SE, spherical equivalent; D, diopter; PD, prism diopter; log arcsec, logarithm of arc second; SD, standard deviation. subject. Hence, 184 data points were analyzed. The algorithm of peak tting-based detection was classi ed correctly in all directions of target orientation.
peak, peak tting-based detection; threshold, threshold-based detection.   Figure 1 Setting the relative origin.

Figures
In clinical settings, the target that is presented by the examiner (A) does not always coincide with the center of the scene camera (B). The median of the target location was calculated both horizontally and vertically, respectively, and was de ned as the relative origin (C).

Figure 2
Algorithm of peak tting-based detection The target location was converted to the position vector, and then, the maximum and minimum peaks were detected for 3.0 s (A). (B) Data between two minimum peaks, including one maximum peak in the green square of (A). The separated data were decomposed into horizontal (C) and vertical (D) components from the position vector. After excluding 1 s from both ends of the separated data (black vertical lines in C and D), the medians of the horizontal and vertical target locations were calculated and plotted (E). (F) Superimposition of the relative origin data, which is the same as Figure 1C of the target. Categorizing each direction.
The eight horizontal and vertical median locations were ranked from maximum to minimum at left (green ellipse), right (yellow ellipse), upper (blue ellipse), and lower (red ellipse), and then, the top three values in the four directions were grouped (A). Then, the plots that belong to the two groups (upper left, upper right, lower left, and lower right) were identi ed by combining the horizontal and vertical directions (the red squares in B). Each plot was numbered according to the order of the maximum peak calculated in Figure  2A; thus, the fth waveform is in the upper left.