GMM-based GNSS spoofing detector using double differential phase measurement

GNSS spoofing is an intentional interference wherein a GNSS receiver tracks counterfeit signals, resulting in incorrect outcome positions. This is the most dangerous type of intervention in the GNSS technology. In this study, we introduce a novel method for detecting spoofing signals using a Gaussian mixture model (GMM) on the double-carrier phase difference (DD) created by two independent receivers. The DD can completely eliminate errors caused by the satellite clock, receiver clock, and atmospheric layers; hence, the signal angle of arrival (AoA) is clearly expressed in the DD measurement. We utilized the GMM to model the probability density function of the DD measurement computed from the phase measurement of the receivers. Theoretically, AoA values of an authentic signal change over time owing to the nature of the signal broadcast from an orbiting satellite. However, fake signals are often transmitted from the generator, resulting in a central distribution of the corresponding AoA values. Existing studies deal with the spoofing detection problem using the above theoretical assumptions. However, it is practical to broadcast spoofing signals from several sources, and random noise can be mixed into the generated phase to render the DD measurement noisy. In such complicated scenarios, existing approaches are not sufficiently robust to detect non-authentic signals. Another observation is that because real satellites are moving in fixed orbits, there should be a correlation among the AoA values of the signals coming from these satellites. In contrast, counterfeit signals (with the main purpose of causing the wrong position or noisy phase) do not follow the pattern of the real signals. Therefore, instead of using the above hard assumption about the signal AoA, we propose a statistical model GMM to learn the hidden relationship of the AoA values among real signals from different real satellites, which are then used to detect spoofing signals.


Introduction
Nowadays, a large number of services, such as vehicle monitoring, aircraft navigation, unmanned aircraft, and automation in agriculture, require precision and integrity of the Global * Author to whom any correspondence should be addressed.
Navigation Satellite Systems (GNSS).However, these services can become targets for terrorists and hackers.One of the most dangerous attacks is the use of counterfeit signals, whose structure is simulated to be the same as that of real satellite navigation signals, to provide incorrect measurements to the receiver.This type of attack, called spoofing [1][2][3][4][5], does not interrupt the GNSS receiver; hence, there may be no awareness of the invalid positioning result.In recent years, scientists have studied and developed methods to detect spoofing GNSS signals.Existing methods can be categorized into three main approaches: cryptographic security [6][7][8], signalbased techniques [3,[9][10][11][12][13], and methods that leverage external sources of verification [14][15][16][17].Most techniques discussed in the literature are for single-antenna receivers because this is the most common operating condition for active receivers.However, recent studies [3,9,11,18] have indicated that spoofers are expected to broadcast fake signals from one antenna on the ground, while authenticated signals are transmitted by satellites in orbit from different directions; therefore, spatial information from two or more antennas on the receiver side is more powerful and effective for detecting fake signals.
The authors in [9][10][11]19] developed a method for spoofing detection based on a double-carrier-phase difference (DD) using a pair of receivers and antennas.This approach utilizes the relative position between two antennas and does not require specialized hardware or special constraints on the geometry of the system.The above studies used a simple assumption: all fake GNSS signals come from the same direction; therefore, the geometrical features are similar for all singledifference carrier-phase measurements; these similarities are almost completely canceled during the calculation of doubledifference phase measurements.In contrast, geometrical features remain in the measurements calculated from authentic signals because they are broadcast from different directions.Based on this principle, the authors of [9] designed a sumof-squares (SoS) detector to distinguish fake and real GNSS signals.The proposed algorithm was successfully tested using a dual-antenna system with two independent commercial receivers.
Although the SoS method has proven to be a simple but effective technique for detecting spoofing attacks, it has some limitations.Because the SoS uses carrier-phase measurements, it is prone to cycle slips [10], which may cause false predictions.Furthermore, SoS does not consider situations in which fake and authentic signals are combined [9].In [10,19], the authors improved the SoS by adding a cycle-slip detector and utilizing the dispersion of double difference (D3) measurements to detect irregularities.The D3 detector works as expected and is capable of detecting fake signals even when they are mixed with authentic signals [13,19].However, the D3 algorithm is cumbersome, and errors may occur when the DD value of a real satellite pair crosses the DD of a fake pair.From the above analysis, we realized that using only pure values of the DD measurement is not sufficiently robust to detect irregularities in the GNSS signal, particularly when some inauthentic signals are mixed with those from real satellites.We also observed that there should be some hidden correlations among the DD values of real satellite signals because these satellites are moving in fixed periodic orbits.Because this correlation is not analytical, it is almost impossible to follow when spoofing signals are generated.However, it is evident that we can statistically learn this hidden correlation using stochastic models with data collected from real satellites while they are moving in orbit.Therefore, we propose combining the DD measurements with a statistical model to detect fake signals.Our experiments and analysis indicated that this approach achieved good detection results with high computational complexity.
In this paper, we propose the use of a Gaussian mixture model (GMM) to detect fake signals.As in [20], the GMM can create an approximate density function for a training dataset by weighting many Gaussian functions.Not only does the GMM provide a smooth overall distribution fit, but it also describes a method for classifying multimodal data with low computation.
The remainder of this paper is organized as follows: section 2 introduces the DD model of the two receivers, and section 3 describes the GMM.Section 4 describes the proposed approach and provides visual evidence.The experimental results are presented in section 5. Finally, in the final section, we conclude our work with its advantages and disadvantages.

Differential carrier-phase model
If we consider two independent commercial receivers that simultaneously receive signals from one satellite, we can use their output measurements to calculate a single-carrier phase differential for each satellite as follows: ( δT (2) − δT (1)   ) where: • Indices (1) and ( 2) denote measurements from the two receivers, • ϕ i is the carrier-phase measurement for the ith satellite in the monitoring (i = 0, 1, 2, …, I), expressed in meters, • r i is the geometric distance between the receiver and the ith satellite, • ∆N i λ is the difference between two unspecified integer carriers, • δT is the receiver clock error, • ∆ε i is some single noise that cannot be modeled, including thermal noise and multi-path one.
It is worth mentioning that in the case of a short distance between the two receivers, the ionosphere and troposphere errors of the two receivers can be considered to be the same and canceled in the differential calculation.The difference in geometric distance between the ith satellite and the two receivers can be computed using the following equation: where D is the distance between the two antennas and α i is the angle of arrival (AoA) of the ith satellite's signal as depicted in figure 1.The DD value of the ith satellite, calculated by subtracting its single difference from that of a reference satellite (denoted by the zero-index value), removes the clock error δT (2) − δT (1) from equation (1): DD in equation ( 3) represents the basic factor for building a spoofing detector, as discussed in the following sections.

GMM
According to [20], the density function of the GMM is the sum of the component density functions.For a pattern classification system of N classes, we have a set of GMMs {λ 1 , λ 2 , …, λ N } linked to N classes.
For a D-dimensional feature vector defined as ⃗ x, the nth GMM density function is defined as follows: where M is the number of mixed components; ω n i , i = 1, . . ., M, are weights of the Gaussian components ( } . ( A GMM density function is parameterized by mixing weights, average vectors and covariance matrices and is denoted as In a GMM-based classification system, the goal of model training is to estimate the parameters of the GMMs so that the mixture's density function can best fit the distribution of the training feature vectors.In the classification phase, the GMM with the highest density value represents the truth label recognized by the classifier.

Description of the proposed approach
We used a mixed dataset that included signals from both real and fake satellites, as shown in figures 2 and 3.This mixed dataset consisted of three fake satellites (pseudo-random-noise (PRN) code number 1, 10, and 11) and five real satellites.From figures 2 and 3, we can see that the threshold-based SoS algorithms may mistakenly identify all satellites as real satellites (in figure 3, when a real satellite is used as the reference, the DD values are all high).Although some improvements have been made in [10], these approaches still require longterm observations to decide whether a satellite is a fake one or not (because the absolute threshold value is useless, those approaches analyze the trend of the DD values in a time window).
As explained above, we propose a method to learn the hidden correlations among the DD values of satellites instead of using absolute DD values.Therefore, we need a method to include this geometric correlation in the feature vectors extracted from the DD values.It is a fact that real satellites move on fixed orbits periodically; therefore, at any time in a day, any combination of four satellites should construct a repeatable geometric configuration.The configuration was hidden in the DD values extracted from the signals.Because this type of geometric configuration is not visible and is not analytically represented in math, any combination that includes one or more fake satellites will not follow this rule.Therefore, it is reasonable to use this fact to detect spoofing attacks.
Assuming that the receivers acquire a set of satellites denoted as S = { s 1 , s 2 . . .s n } , we construct all groups of four different satellites G i = { s p , s q , s k , s l } i = 1, 2, 3 . . .N, where N is the total number of groups.In each group (e.g.G i ), we used each satellite as a reference to compute the DD values of the remaining three satellites.Therefore, at time epoch t, for a group G i = { s p , s q , s k , s l } , we can calculate a total of four three-dimensional points corresponding to four references, and the calculation is performed with all time epochs to create point clouds of all possible groups.To clarify the correlation of the satellites, we plotted all the points for our dataset, as illustrated in the figures below.It is worth noting that all point clouds were quite well separated, particularly the authentic point clouds, which were calculated from four authentic satellites.
Figure 4 illustrates the distribution of points for the different types of satellite groups.
• Spoofed 1a2s: points of a group with one authentic and three fake satellites; one of the fake satellites is used as the reference.• Spoofed 2a1s: points of a group with two authentic and two fake satellites; one of the fake satellites is used as the reference.• Spoofed 3a: points of a group with three authentic and one fake satellite; the fake satellite is used as the reference.• Spoofed 3s: points of a group with four fake satellites; the reference is surely one of the fakes.• Aut 2a1s: points of a group with three authentic and one fake satellites; one of the authentic satellites is used as the reference.• Aut 3a: points of a group with four authentic satellites, one of the authentic satellites is used as the reference.• Aut 3s: points of a group with one authentic and three fake satellites; the authentic satellite is used as a reference.• Aut 1a2s: points of a group with two authentic and two fake satellites; one of the authentic satellites is used as the reference.
Using this approach, spoofing attacks that require complicated mechanisms for detection or visualization [10,11,19] are now easily presented, as shown in figure 5.In figure 5, we illustrate an example of spoofing using more than two fake and two real satellites.Because we generated all possible combinations of the four satellites, there must be groups with two fake and two authentic satellites.The corresponding point clouds were distributed at specific locations, as shown in figure 5.Because the DD values of the fake satellites are similar owing to the AoA assumption, if the reference is fake, the clouds (black and red) should be located on the zero-crossing planes; if the reference is real, the clouds (yellow and green) should appear on the bisector planes.
In figure 6, we combine all groups with at least one fake satellite (spoofed), and all groups with all four real satellites (authentic); the point clouds of the two categories are plotted below.It is clear that authentic clouds (from all four real satellites) are separable from fake clouds.
Figures 4-6 show the possibility of using three-dimensional feature vectors constructed using a combination of four satellites, with one of them as the reference for the classification of spoofing GNSS data.In the following paragraphs, we further analyze a complicated scenario in which only one satellite is fake.This scenario is complex as all existing  AoA-based approaches [9][10][11]19] cannot be solved because when only one satellite is fake, the assumption regarding the arrival direction is not true.Fortunately, our approach works well in such difficult scenarios, as illustrated in figure 7.
The above visualization helps us determine the ability to develop a method to identify whether there are any spoofed satellites at a specific time epoch.It is clear that the spoofed and authentic points are well separated in all the situations analyzed above.Based on our observations, we implemented the spoofing identification engine using a model to parameterize the distribution of points (one for spoofed and one for authentic) and then calculated the probability of a given point to determine if that point belongs to the model.

Experimental evaluation
For our experiment, we used the mixed-signal dataset mentioned above.This dataset contains data from real satellites and a GNSS signal generator (spoofer).To record this dataset, we used a pair of short-baseline antennas, as shown in figure 8 (we placed three antennas, but actually used two of them); each antenna was connected to one receiver (Septentrio  AsteRx4).The spoofer (IFEN NAVX NCS Essential GNSS simulator) generated deceptive signals (in synchronization with real satellite information) and broadcast them through an antenna directed toward the receiver's antenna (see figure 8).Because the receiving antennas were located outside, the receivers recorded both authentic and spoofing signals.
We used k-fold (K = 10) to divide the dataset into training and testing sets.From the original dataset, we extracted groups of four satellites at each time epoch, labeled the group as spoof (if one of the satellites was from the generator) or authentic, and then extracted the three-dimensional feature vector, as described in the previous section.The entire dataset of feature vectors was divided into ten parts, nine of which were used for training, and the remainder was used for testing.
In the training phase (see figure 9), from the set of feature vectors (with labels), we used the expectation maximization (EM) algorithm to train two GMMs (one for spoof points and one for authentic points), where the output of the training phase is a set of two probability density functions (pdfs) parameterized by mean vectors, covariance matrices, and the mixing weight of Gaussian pdfs.
In the testing phase, a feature vector (computed from a group of four satellites at a specific time epoch) without a known label was fed into the two GMM pdfs to generate two scalar probability density values.The detector then compared these two values to determine whether the group contains a fake satellite, depending on whether the spoofed value is greater than the authentic value (figure 10).More details of the algorithms are provided in figures 11 and 12 below.
The experiment was repeated ten times, and each time, the original dataset was divided randomly into ten parts as described above.We compared the classified labels with the corresponding ground truth to estimate the detection accuracy   of our GMM-based detector and obtained the results shown in table 1.
From table 1, it can be seen that our GMM-based detector achieves a stable and high classification accuracy, which is better than that of existing methods [10,13,19].Furthermore, we would like to re-emphasize that our approach works well in a scenario where only one satellite is fake, whereas the existing methods do not.
Another advantage of the proposed GMM-based approach is that it does not depend on the C/N0 ratio.Because our   approach must be trained in advance, it requires a full-period collection of GNSS data with manual labels.This could be considered a major disadvantage compared with existing approaches.

Conclusion
In this study, we propose a novel approach to detect GNSS spoofing attacks based on a combination of DD measurements and a GMM.The DD measurement contains information regarding the AoA of the satellite (whether it is fake or authentic).Existing methods utilize DD measurements under the assumption that fake signals come from the same direction if broadcast from a single spoofer.Therefore, in a complicated scenario where multiple spoofers are present or only one fake signal is transmitted, such existing approaches are not sufficiently robust to detect an attack.Instead of using an absolute single value of the DD measurement, we propose combining three values from a group of four satellites into a feature vector or feature point.The distribution of these feature vectors (points) reflects some hidden geometric correlations of satellites; hence, they may be used to reveal spoofing attacks.Our visualization section clearly explains the concept of fake and authentic point distributions, and the experimental results confirm that the performance of our detector is better than that of existing detectors.In this study, we have not yet identified the fake satellite exactly; however, it is possible to do so if we take the intersection of all groups containing the fake satellite detected by the GMM-based detector.

Figure 2 .
Figure 2. Double-carrier phase difference measurement in a mixed dataset with a fake satellite as the reference.

Figure 3 .
Figure 3. Double-carrier phase difference measurement in a mixed dataset with a real satellite as the reference.

Figure 4 .
Figure 4. Double-carrier phase difference points distribution of all the four-satellite combinations.

Figure 5 .
Figure 5. Double-carrier phase difference planes for the mixed data, including two spoofed and two real satellites.

Figure 6 .
Figure 6.Double-carrier phase difference points of real and fake data.

Figure 7 .
Figure 7. Double-carrier phase difference points of the GNSS data epochs with only one fake satellite.

Figure 9 .
Figure 9. Training phase of our spoofing detector.

Table 1 .
Result cross validation testing.