Pedestrian safety is affected by their environment. Pedestrians must have a perception of the current situation, understand its meaning, and project the situation in the future to avoid any kind of collision while walking (Lee et al. 2021). This refers to the importance of knowing the SA of pedestrians using effective measurement methods. The current study aimed to compare the effectiveness of the two SA measurements in a pedestrian environment. The significance of the study comes from its theoretical contributions to understanding the effectiveness of SA measurement in the pedestrian domain, which has different characteristics compared to other domains in previous studies. In evaluating two SA measurements, this study designed experiments based on GDTA in pedestrian environments and used sensitivity, intrusiveness, and predictive validity parameters.
Sensitivity. Statistical analysis results show that SPAM accuracy and SPAM-response time were more sensitive in capturing the difference in SA scores between the three 'workload' treatments. The difference is shown in 'low' to 'high' workload and 'normal' to 'high' workload. The addition of 'workload' to the experiment led to a decrease in accuracy and a longer response time for participants, which indicated a decrease in a person's SA. Meanwhile, the results from SAGAT did not show any significant differences in all three scenarios. Thus, SPAM in real-time-probing shows better sensitivity than SAGAT as freeze-probing in pedestrian domains.
The statistical analysis results on this parameter support the conclusions of several previous studies in different domains (Endsley, 2021). In previous studies, SPAM was more sensitive than SAGAT in capturing the differences in the Integrated Hazard Display in the flight domain. However, it requires a longer response time to answer queries. (Alexander and Wickens 2005). Other studies on the highly autonomous flight domain show similar things. SPAM can capture differences in the learning effects of missile factors, whereas SAGAT does not show any significance in similar variables (Cummings and Guerlain 2007). Another study using SAGAT and SPAM in an emergency event said that neither SPAM nor SAGAT could distinguish the effects of 'workload' treatment. However, SPAM accuracy is more sensitive in capturing differences in SA scores based on the effect of participants 'experience' (Silva et al. 2017). Nonetheless, SAGAT was also more sensitive than SPAM in some studies, particularly about 'display' differences.
Intrusiveness. Based on intrusiveness, the results in Fig. 2 show performance scores in all scenarios where SA-SPAM measurements are lower than SA-SAGAT. These results support the conclusions of previous studies that stated that there was a negative effect of SPAM verbal questions on performance during SA measurements (Pierce 2012; Loft et al. 2016). Wilcoxon Test results support that differences in performance scores can be affected by intrusions from the use of SPAM. In the SPAM experiment, there were two points in timeline where respondents had to run three activities at once. These activities include walking, listening to videos, mentioning objects by name, and answering SPAM questions directly. Such parallel activity may be the cause of the decline in the performance of most respondents. Although participants were given the flexibility to determine when they were ready, answering SA questions with the main task causes secondary tasks that interfere with the performance of the main task (Silva et al. 2013). In addition, performance differences can also be caused by the shorter duration for participants to run verbal protocols on SPAM experiments than for SAGAT experiments. However, the analysis was judged insufficient to conclude that the use of SPAM caused intrusion, which was the main cause of the decline in performance scores.
Furthermore, the results in Fig. 5 show that performance scores in both methods have decreased at 'high' from 'normal' workload. The increase in performance score also occurs at 'low' from 'normal workload on SAGAT, while the opposite condition occurs in SPAM which experiences a decrease in performance score at 'low' from 'normal' workload. However, statistical analysis shows significant differences between these two conditions. However, statistical analysis shows significant differences between the two methods in the 'low' and 'high' workload scenarios. This shows that the performance scores between the two methods have changed differently, although in the 'normal' to 'high' workload scenarios, the same decreases.
Predictive validity. The overall results in Table 1 show that SAGAT and SPAM-response time can predict performance to substantial categories in a 'low' scenario. However, other results show that predictive ability tends to be weak. SAGAT produces better predictive values in the 'high' and 'normal' scenarios. These results support a meta-analysis by Endsley (Endsley 2021), concluding that SAGAT and SPAM can predict performance. Another Endsley study reported that SAGAT has predictive validity when measuring pilot SA in a combat simulation (Endsley et al. 2000b). The results of SAGAT measurements that include more queries show SAGAT can be more predictive than SPAM (Loft et al. 2015). The same results were shown in this study, where SAGAT queries were twice as many as SPAM.
Meanwhile, when SAGAT was limited to just one query at each freeze and sample reduction, the results showed SPAM to be more predictive than SAGAT (Durso et al. 2006). Furthermore, other studies mention that SPAM cannot predict well when workloads are high due to key task demands, so subjects ignore SPAM questions (Pierce et al. 2008). The results of this study support this question, where SPAM shows a lower predicted value than SAGAT in the 'high' scenario.
Thus, both methods have advantages and disadvantages in each of the parameters. However, SPAM is considered more suitable for SA measurement needs in pedestrian domains that tend to be simple, stable, and recurring patterns (Tong et al. 2018). SPAM predicts better performance in 'low' workload conditions. This condition follows the walking activities carried out every day without such a high workload. SPAM is also more sensitive in capturing differences in pedestrian environments. In real conditions, pedestrians are often at crossing activities or in dense traffic, which is difficult to stop. So, SPAM is more appropriate because it does not require stopping time when measuring SA.
4.1 Findings
This study's results of SA measurements in pedestrian environments yielded some new findings. First, two SA measurements are the most widely used compared to find out which probes are more effectively used in pedestrian environments, namely freeze-probing and real-time-probing. Sensitivity, intrusiveness, and predictive validity parameters were used in evaluating both probes. Second, SPAM as real-time-probing is more sensitive to changes in the pedestrian environment, but an instrusion effect on performance needs to be minimized. Third, SAGAT as freeze-probing can predict better than SPAM at higher workload conditions, while SPAM-response time is better in low workload conditions.
4.2 Limitations and Future Study
There were some limitations in the current study. First, the results were based on the experiment data from 11 participants. Larger sample size will be required to obtain more convincing results in the future. Second, future studies will need to consider other environmental factors, such as 'displays' relating to signs and regulations in pedestrian environments, to evaluate the sensitivity of both measurement methods further. Third, in evaluating intrusiveness, this study compared changes in performance with relative perspective, namely between SAGAT and SPAM experiments. The results were considered less able to identify the difference in the instruction of the two SA measurements. Future evaluation studies based on an absolute perspective are expected to complement the results of this evaluation. In addition, performance measurements need to consider the same duration between SPAM and SAGAT further to review the negative effects of differences in measurement durations, resulting in low-performance scores when measuring SA with SPAM.