Situation Awareness Measurement in Pedestrian Environments: A Comparison between Freeze-probing and Realtime-probing technique

doi:10.21203/rs.3.rs-2066539/v1

Download PDF

Research Article

Situation Awareness Measurement in Pedestrian Environments: A Comparison between Freeze-probing and Realtime-probing technique

https://doi.org/10.21203/rs.3.rs-2066539/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Pedestrian safety is affected by their environment. Reduced pedestrian awareness is generally due to walking being an activity repeatedly every day. Situation Awareness (SA) measurements can evaluate a pedestrian's ability to handle environmental situations. This refers to the importance of knowing the SA of pedestrians using effective measurement methods. This study aims to evaluate methods of measuring SA based on the effectiveness of its measure in a pedestrian environment by comparing the freeze-probing and real-time-probing techniques. Participants were eleven pedestrians (seven male) with an average age of 22 (SD ± 1.3) years old and walking habits for 15 (SD ± 8.5) minutes/day. Participants watched a video of a pedestrian walking on a treadmill with three speed levels. SA was measured by freeze and realtime-probing in different scenarios. Performance was determined based on verbal protocol. The realtime-probing has better sensitivity, but it has more problems with intrusiveness. Based on the regression results between predictors in the form of SA scores and performance as dependent variables, this probe shows better predictive results at low workload conditions, which corresponds to the characteristics of the pedestrian domain. Based on its advantages, real-time-probing is considered more effective for SA measurement needs in pedestrian environments that tend to be simple, stable, and do not cause high workloads. An effective measurement method will result in an accurate measure of SA score that can be the basis for developing the concept of SA and designing new designs or procedures in the pedestrian environment.

comparison

freeze probing

real-time probing

pedestrian

situation awareness

Pedestrian safety has become a serious concern as the number of serious injuries to deaths from traffic accidents involving pedestrians increases. Nearly 50% of pedestrian deaths are related to disorder and unconsciousness (Lee et al. 2021). Reduced pedestrian awareness is generally due to walking being done repeatedly daily without much cognitive consideration (Harms et al. 2019). This automated behaviour increases the potential for insecurity in pedestrians (Nasar and Troyer 2013; Feld and Plummer 2019). In a pedestrian environment, many elements contribute to the safety of its users. To walk safely, pedestrians must understand these elements and predict their potential dangers (Lee et al. 2021), just as the concept of situation awareness (SA) refers to knowing what is happening in an environment, understanding those elements for the present and future (Endsley and Garland 2000). In designing a new system or procedure in a particular domain, the individual level of SA becomes important (Endsley et al. 2003; Salmon et al. 2008). This highlights the importance of studying SA pedestrians to its environment.

SA measurements can evaluate an operator's ability to handle the environment (Kraemer and Süß 2015). SA measurements can provide clear data on the actual effects of the developed design concept (Endsley et al. 2003). Thus, the researcher and the designer must accurately capture the SA level. The basic step to accommodate such efforts is to provide a valid and reliable measurement method (Endsley 2021) and be sensitive to differences in design concepts (Endsley et al. 2003).

In previous studies, the most widely used SA measurement methods were freeze probes and real-time probes. In freeze probes, the simulation is stopped at a randomly selected time, and the monitor or system view is emptied to inquire about the subject's perception of the current situation. The subject's perception is then compared to the final situation based on actual conditions (Endsley 1995). Situation Awareness Global Assessment Technique (SAGAT) is the most popular freeze-probing technique (Salmon et al. 2006). This method was found to be a compassionate, reliable, and predictive measure of SA that is useful across various domains and experimental settings (Endsley 2021). But some researchers also mention that these techniques are criticized for their intrusion upon primary task performance (Salmon et al. 2006) and require task simulation because it is difficult to stop the task directly in the field (Nguyen et al. 2019).

Meanwhile, in real-time probes, measurements are taken by asking questions while the subject performs the main task, either directly or simulated, without stopping (Durso et al. 2006). The most common real-time-probing technique is the Situation Present Assessment Technique (SPAM). Without time stoppage when measuring SA, this method can be used directly in the field or in real-world environments where tasks cannot be stopped (Zhang et al. 2020). But when the subject gets a heavy task load, asking questions will add weight to the task load. The question asked can also be a cue on the subject to analyze the information on the monitor's display. So it is feared not to measure the actual SA (Endsley 1995).

1.1 Related Study

In other domains, previous studies have been performed to evaluate the efficacy of SAGAT and SPAM. Indicators used to evaluate the effectiveness of SAGAT and SPAM in previous studies include intrusiveness (Endsley et al. 2000b; Pierce 2012; Paige Bacon and Strybel 2013; Silva et al. 2013; Keeler et al. 2015; Loft et al. 2016), sensitivity (Endsley et al. 1998, 2000b; Jones and Endsley 2000, 2004; Alexander and Wickens 2005; Walker et al. 2008) and predictive validity (Durso et al. 1998, 2006; Endsley et al. 1998, 2000b, a; Pierce et al. 2008; Loft et al. 2015)

The SA measurement method is said to have a good level of sensitivity when it can detect changes in treatment between measurements (Uhlarik and Comerford 2002). In assessing the sensitivity of the SA measurement method, a common way is to vary the construct to assess how well the measurement detects changes in those variations. But there is no direct way to vary SA independently. Therefore, variation is done by manipulating experiments in research (Endsley 2021). Previous studies evaluated sensitivity by manipulating experiments into several scenarios such as distinguishing workloads (Jones and Endsley 2000), display interfaces in the aviation domain (Endsley et al. 1998, 2000b; Jones and Endsley 2000, 2004; Alexander and Wickens 2005). Other studies tested the sensitivity of methods by manipulating vehicle feedback design in the direction of technological trends in automotive engineering (Walker et al. 2008). In measuring the sensitivity of the SA measurement method, most researchers use analysis of variance (ANOVA) (Endsley et al. 1998, 2000b; Jones and Endsley 2000, 2004; Alexander and Wickens 2005) to test whether there are differences between treatments. There is also a meta-analysis approach, Cohen's, that compares the size of effects. However, this approach can only assess sensitivity by comparing the relative differences between the mean of the two groups. In other words, this approach is not appropriate for assessing when SA size results are in the proposed form (right or wrong) as the results of SA measurements in general. However, this approach can be used in SA measurement results using SPAM based on response time (Endsley 2021).

Intrusiveness refers to the extent to which measurement methods interfere with the main task, changing SA from its actual size (Uhlarik and Comerford 2002). In general, intrusiveness is evaluated based on performance or workload by comparing pre-experimental performance scores to performance scores during experiments (Endsley 2021). In previous studies, pre-experiments were baseline conditions or conditions when participants performed the primary task without measuring SA by any method (Paige Bacon and Strybel 2013; Kraemer and Süß 2015). In other studies, the level of measurement intrusion was reported directly by participants subjectively (Endsley et al. 2000b).

Criterion validity determines how well a measurement method can predict outcomes based on other variables. Criterion validity consists of two types of measurement time: concurrent validity and predictive validity. Predictive validity measures how well variables predict future events. So that the test is done at different times (Neuman 2014), the SA measurement method is said to have a good level of criterion validity when it can predict the behavior or performance of other aspects (Uhlarik and Comerford 2002). In evaluating the validity of criteria in the military domain, previous studies used regression analysis to determine predictive values based on correlation (Salmon et al. 2009). Regression analysis is a statistical method that generates a model that can predict the value of the dependent variable from one or more independent variables (Field, 2009). The parameter used is the value R² (Pierce et al. 2008; Strybel et al. 2008) or adjusted R² (Durso et al. 1998, 2006; Endsley et al. 1998, 2000a) represents a significant correlation between predicted values and observed SA values.

Previous studies evaluating the effectiveness of SA methods have generally examined domains with different characteristics to pedestrian environments. Domains in these environments tend to be dynamic and highly uncertain (Chmielewski et al. 2018). The main task in the environment is operational complexity (Endsley and Rodgers 1994b), requiring operators to interpret various information at a specific time to face unexpected technical errors (Li et al. 2020). Previous researchers said there had not been agreed on universally embraced measurements.

1.2 The Present Study

Which measurement method is the most appropriate for assessing SA from the various methods available remains debatable (Salmon et al. 2009). Therefore, this study evaluates SA measurement methods in environments with different characteristics from previous studies. The pedestrian environment is rated as a simple, stable, and recurring pattern (Tong et al. 2018). The main task is fundamental to each individual and is done every day (Harms et al. 2019), namely walking. SA-related studies in pedestrian environments only utilize existing methods based on a literature review (Pielot et al. 2010; Zhang et al. 2010; Sheik-nainar et al. 2015; Wolf and Kuber 2018; Lee et al. 2021). However, it has not considered the efficacy of the measurement. A limited study was found regarding evaluating the effectiveness of the SA method measurement in pedestrian environments

SA measurement studies in other domains have evaluated methods based on specific indicators. Thus, the study aims to evaluate methods of measuring SA based on its effectiveness in a pedestrian environment. The evaluation compares two SA methods, SAGAT from freeze-probing and SPAM from real-time-probing. This study evaluated which ways were more effective in providing an actual measure of SA in a pedestrian environment. The evaluation of both methods is based on sensitivity, intrusiveness, and predictive validity. An effective SA measurement method will result in an accurate measure of SA levels. The accurate measurement results can be the basis for developing the concept of SA in the pedestrian environment and designing new designs or procedures in the pedestrian environment.

2.1 Participants

Eleven participants (seven male and four female) with an average age of 22 (SD ± 1.3) years recruited in this study. They have a normal or corrected to normal vision, and no motor disabilities. All participants reported walking habits for 15 (SD ± 8.5) minutes/day. Before starting the experiment, participants received an explanation of the experimental procedures, and they gave written consent to participate. All participants received 5 USD per day during the experiment for their participation.

2.2 Apparatus and Scenario

The walking simulation used a Kettler Pacer treadmill to ensure participants were walking at a constant speed. Video simulating pedestrian environment displayed in a large 40-inch Full (40K6300, Samsung). Participants responded to an SA query on Google Form via iPhone v.6. During the experiment, all participants' activities were recorded using the M10 Canon camera.

The experiment was designed using the 'workload' treatment. The workload was determined based on the running speed of each participant. Speed on the medium level was the average speed of walking participants in the last 30 seconds while doing exercises before the experiment began. Furthermore, the speed at the medium level becomes the standard for determining the speed of the other two levels. High and low levels are +/- 50% of medium speed levels (Shaw et al. 2018). These three-speed levels were then used in the experiment. Each participant's walking speed on a treadmill matched their abilities during the experiment.

SA measurements have been carried out by two methods, freeze probe represented by SAGAT and online probe represented by SPAM. The 'workload' treatment results in three scenarios for each method. The experiment was designed as a within-subject study. The order of scenarios was randomized, with a complete counterbalancing procedure to avoid carryover effects.

2.3 Experimental Procedures

The pedestrian environment consists of multiple unpredictable variables that affect the participants' awareness (e.g., traffic, noise). Therefore, experiment-based studies are better at controlling these variables (Lee et al. 2021). All participants performed three scenarios for each method. In each scenario, the participant's main activity was to walk on treadmills in an enclosed and soundproofed space while viewing videos of pedestrian environments. This setting simulated the environment and walking activities in actual conditions. The use of video to control the actual effects of environmental dynamics makes all participants feel the same dynamics of the pedestrian environment. There were no effects or variations on the video for all scenarios. In each scenario, a treadmill was used to ensure participants walked at a constant speed at each treatment.

Participants were briefed and practised before running the first scenario. The instruction text was displayed on the screen and read to the participant as a briefing. Participants were required to ask if they did not understand the instructions. After the briefing, participants got training on a treadmill at a self-determined pace. Participants were considered to have fulfilled the training stage when participants had been able to walk steadily on a treadmill while looking at videos and mentioning the names of the objects. The average walking speed in the last 30 seconds during practice becomes speed at the medium level of each participant in the experiment. High and low workload levels were +/- 50% of medium speed. After the exercise stage, participants were given a 30-minute break to proceed to the experimental stage. One scenario lasts for +/- 10 minutes, with a 30-minute lag between scenarios. After completing the experiment session every day, all participants were rewarded with a shopping voucher and additional rewards for those who achieved the best performance.

2.4 SA Measurements

Freeze-probing. SA was measured with the SAGAT method at three levels based on three-level models. The probes include queries to assess the participants' perception, comprehension, and projection of the pedestrian environment. Twelve queries in SAGAT probes were administrated at two freeze points in each scenario. The freezing was no more than 6 minutes on each point. At the time of the freeze, the video screen was blanked, and the simulation was frozen. Participants answered the probes by filling out Google Form when the treadmill was stopped. Participants' responses to SAGAT probes are scored 1 (correct) or 0 (incorrect). High scores on SAGAT indicate a good SA level. The correct responses of all participants were processed into mean and standard deviations for each scenario.

Realtime-probing. SA was measured with SPAM using queries adapted to the SAGAT queries. Participants responded to three queries in the 3–4 minutes range and three queries in the 9–10 minutes range. The SA score was based on the accuracy of the participant's answer and the time it took to respond to the question. Therefore, participants were asked to answer as accurately and quickly as possible. Like SAGAT, participants' responses to SPAM probes were scores of 1 (correct) and 0 (incorrect), as well as the amount of time it took to respond in seconds. The response time was determined based on the length of time from when the question was heard in a recording until the participant began to mention the answer. This provision ensures that all participants get the same time when questions are heard. As in freeze-probing, all participants' correct responses and response time were processed into mean and standard deviations for each scenario. High response accuracy and short response time indicate a good SA level.

SA queries are generally developed based on Goal Directed Task Analysis (GDTA) to ensure that the questions asked to match a person's way of thinking about information. This analysis was used to determine what pedestrians should know to meet the objectives of this study. This was in the form of dynamic situational information that can influence pedestrian decisions, not in the form of static knowledge such as procedures or rules. Analysis of established SA requirements can be the basis for developing SA queries according to their domain. Analysis of SA requirements that previous researchers have conducted included the aviation domain (Endsley 1993; Endsley and Rodgers 1994a; Endsley et al. 1998), nuclear technology (Hogg et al. 1994), medical domain (Tse et al. 2020), emergency management (Cuevas et al. 2011), marine (Sharma et al. 2019).

Meanwhile, on other domains that need to develop queries, it can adapt the same methodology to an existing GDTA with some adjustments (Endsley et al. 2003). This study developed queries based on GDTA, experimental capabilities and design. The preparation of GDTA is carried out through observations and literature in previous research. The resulting analysis was then validated on several subjects (Endsley and Garland 2000). The analysis consists of the goals and sub-objectives of the identified pedestrian activity. Likewise with the decisions that need to be made in each sub-goal, as well as SA levels 1, 2, and 3 in each decision. The GDTA in pedestrian domain can be seen in Fig. 1.

2.5 Performance Measurement

The performance measurements were also taken using the verbal protocol in six scenarios. Verbal protocol assesses participant performance based on how often the participant mentions the object's name, most often mentioned by other participants. Objects most often mentioned by other participants are assumed to attract and grab attention while walking (Read et al. 2015). The names of the objects mentioned by the participants were identified through video recordings of the experiments. Each object name mentioned by a participant is given a score of 1. So that the name of the object most often mentioned by more participants will get the highest score. The score is used to determine the weight of each object. The highest weight is given to the object with the highest score.

Furthermore, the participants' performance scores were calculated based on the weight of each object. The weight value of the object mentioned by each participant is summed. The sum result is the performance score of each participant. The higher the score result, the better the participants' performance. The verbal protocol was stopped for a moment when participants answered questions on both SPAM and SAGAT methods.

2.6 Statistical Analysis

The effectiveness of SAGAT and SPAM is measured based on sensitivity, intrusiveness, and predictive validity. SA scores are shown as mean and standard deviation. After the data transformation, the Shapiro-Wilk test results showed all collected data was normally distributed, except for SPAM accuracy and SPAM-performance score.

Sensitivity refers to the extent to which the instrument can distinguish changes in conditions or treatments (Uhlarik and Comerford 2002). In this study, varying 'workload' treatment was carried out manipulation. As normally distributed data, the average SA score differences were evaluated using the One-way ANOVA. Post hoc analysis with Tukey HSD is also used to examine the significance of the difference in the average SA score. Friedman's ANOVA was used for data that had no normal distribution. Statistical significance was set at p-value < 0,05, indicating a difference in average values between scenarios.

In general, previous studies compared changes in absolute performance, namely, the performance scores compared were in pre-measurement and when measuring a method. Meanwhile, this study evaluated performance changes from a relative perspective. Evaluation is based on the difference in performance scores based on verbal protocol when measuring SA with SAGAT and SPAM. Measurements in the 'normal' scenarios are used as a baseline with consideration in those scenarios not affected by 'workload' factors. The scenario is assumed to be the same condition for comparing performance changes on SA measurements. Wilcoxon test was used to test whether two groups of experimental scenario means are different, while the same participants took part in both scenarios. P value < 0,05 indicates a difference in average values between scenarios. In addition, the correlation value of performance scores between the SAGAT and SPAM experiments was also analyzed.

Predictive validity is the extent to which the instrument can predict behaviour or future real-life performance (Uhlarik and Comerford 2002). Because the data of independent variables (SA) and dependent variables (performance) are in the metric category, namely data ratio, the test uses simple linear regressions. The generic regression model is Y = a + b X. The dependent variable 'Y' is the performance score, and the independent variable 'X' is the SA score. Related to the model, to prove SA is a viable and measurable construct, individuals should vary in their levels of SA, and this variance should be useful in predicting performance (Durso et al. 1998). So, SA becomes an independent variable to performance as a predicted variable. In this study, the assessment of predictive validity was conducted on the result of SA and performance measurement of the SAGAT and SPAM. SPAM consists of two categories of measurements, namely accuracy and response time. Simple regression analysis uses the SA score as the independent variable and the performance score as its dependent variable. R² values closer to 1 indicate increasingly predictive measurements. The effect size of the calculated R² may be interpreted using the set of descriptors proposed by Cohen (Cohen 1988) as follows 0,26 (substantial); 0,13 (moderate); 0,02 (weak). All statistical analyses were performed using SPSS v 26.

3.1 Sensitivity

The evaluation is based on three 'workload' treatments, namely 'low', 'normal', and 'high' scenarios. Based on the Friedman ANOVA test results, SPAM accuracy showed a significant difference in SA scores (p = 0,003) between the three scenarios. The same result is shown by the SPAM-response time on the One-way ANOVA test (p = 0,006). The difference in the response time was on 'low' to 'high' (p = 0,006) and 'normal' to 'high' (p = 0,040) scenarios indicated from the Post hoc test results. However, the ANOVA test results showed no significant difference in SA scores by SAGAT among the three 'workload' treatments.

3.2 Intrusiveness

Intrusions of SA measurement methods are evaluated based on changes in participants' performance scores. Participants' performance, measured using the verbal protocol, resulted in two performance scores during experiments using SPAM and SAGAT. The correlation value of performance scores between the SAGAT and SPAM experiments in all scenarios is positive and strong (low 0.77; normal 0.86; high 0.88). These results indicate that the performance measurement of the SAGAT and SPAM experiments has the same way of capturing the respondent's performance even though the two experiments are different.

SPAM's performance score is lower than SAGAT's performance score. This is due to the difference in the duration of the simulation video related to the verbal protocol. In the SAGAT experiment, a 'freeze' procedure also stopped the simulation video's duration. Meanwhile, during SPAM experiments, the duration of the simulation video continues to run when the query is read a loud and participants respond to it. So that the duration for participants to run verbal protocols on SPAM experiments is shorter than for SAGAT experiments. Changes in the performance scores of all participants are shown in Fig. 2.

In Fig. 2, the overall result shows that participants' performance scores were lower when SA measurements used SPAM than SAGAT in all scenarios. Moreover, SPAM performance scores increase during a 'low' workload and decrease during a 'high' workload. Meanwhile, SAGAT's performance score decreased both during 'low' and 'high' workloads.

Furthermore, statistical analysis was carried out to identify differences in performance scores in SPAM and SAGAT experiments. The Wilcoxon test was used on each of the 'workload' treatments. The results showed a significant difference in performance scores between the SPAM and SAGAT experiment in all scenarios. ('low' p = 0,006; ‘normal’ and 'high' (p = 0,029) workloads.

3.3 Predictive validity

The assessment of predictive validity was conducted with SA as an independent variable and performance as the dependent variable in each scenario. Based on the result in Table 1, it is known that SAGAT can predict higher than other methods.

Table 1

Regression result
Method	R² value
Method	Low	Normal	High
SAGAT	0,319^a	0,247	0,166
SPAM-response time	0,691^a	0,091	0,002
SPAM-accuracy	0,112	0,179	0,057
^a Shows value is in the substantial category

Pedestrian safety is affected by their environment. Pedestrians must have a perception of the current situation, understand its meaning, and project the situation in the future to avoid any kind of collision while walking (Lee et al. 2021). This refers to the importance of knowing the SA of pedestrians using effective measurement methods. The current study aimed to compare the effectiveness of the two SA measurements in a pedestrian environment. The significance of the study comes from its theoretical contributions to understanding the effectiveness of SA measurement in the pedestrian domain, which has different characteristics compared to other domains in previous studies. In evaluating two SA measurements, this study designed experiments based on GDTA in pedestrian environments and used sensitivity, intrusiveness, and predictive validity parameters.

Sensitivity. Statistical analysis results show that SPAM accuracy and SPAM-response time were more sensitive in capturing the difference in SA scores between the three 'workload' treatments. The difference is shown in 'low' to 'high' workload and 'normal' to 'high' workload. The addition of 'workload' to the experiment led to a decrease in accuracy and a longer response time for participants, which indicated a decrease in a person's SA. Meanwhile, the results from SAGAT did not show any significant differences in all three scenarios. Thus, SPAM in real-time-probing shows better sensitivity than SAGAT as freeze-probing in pedestrian domains.

The statistical analysis results on this parameter support the conclusions of several previous studies in different domains (Endsley, 2021). In previous studies, SPAM was more sensitive than SAGAT in capturing the differences in the Integrated Hazard Display in the flight domain. However, it requires a longer response time to answer queries. (Alexander and Wickens 2005). Other studies on the highly autonomous flight domain show similar things. SPAM can capture differences in the learning effects of missile factors, whereas SAGAT does not show any significance in similar variables (Cummings and Guerlain 2007). Another study using SAGAT and SPAM in an emergency event said that neither SPAM nor SAGAT could distinguish the effects of 'workload' treatment. However, SPAM accuracy is more sensitive in capturing differences in SA scores based on the effect of participants 'experience' (Silva et al. 2017). Nonetheless, SAGAT was also more sensitive than SPAM in some studies, particularly about 'display' differences.

Intrusiveness. Based on intrusiveness, the results in Fig. 2 show performance scores in all scenarios where SA-SPAM measurements are lower than SA-SAGAT. These results support the conclusions of previous studies that stated that there was a negative effect of SPAM verbal questions on performance during SA measurements (Pierce 2012; Loft et al. 2016). Wilcoxon Test results support that differences in performance scores can be affected by intrusions from the use of SPAM. In the SPAM experiment, there were two points in timeline where respondents had to run three activities at once. These activities include walking, listening to videos, mentioning objects by name, and answering SPAM questions directly. Such parallel activity may be the cause of the decline in the performance of most respondents. Although participants were given the flexibility to determine when they were ready, answering SA questions with the main task causes secondary tasks that interfere with the performance of the main task (Silva et al. 2013). In addition, performance differences can also be caused by the shorter duration for participants to run verbal protocols on SPAM experiments than for SAGAT experiments. However, the analysis was judged insufficient to conclude that the use of SPAM caused intrusion, which was the main cause of the decline in performance scores.

Furthermore, the results in Fig. 5 show that performance scores in both methods have decreased at 'high' from 'normal' workload. The increase in performance score also occurs at 'low' from 'normal workload on SAGAT, while the opposite condition occurs in SPAM which experiences a decrease in performance score at 'low' from 'normal' workload. However, statistical analysis shows significant differences between these two conditions. However, statistical analysis shows significant differences between the two methods in the 'low' and 'high' workload scenarios. This shows that the performance scores between the two methods have changed differently, although in the 'normal' to 'high' workload scenarios, the same decreases.

Predictive validity. The overall results in Table 1 show that SAGAT and SPAM-response time can predict performance to substantial categories in a 'low' scenario. However, other results show that predictive ability tends to be weak. SAGAT produces better predictive values in the 'high' and 'normal' scenarios. These results support a meta-analysis by Endsley (Endsley 2021), concluding that SAGAT and SPAM can predict performance. Another Endsley study reported that SAGAT has predictive validity when measuring pilot SA in a combat simulation (Endsley et al. 2000b). The results of SAGAT measurements that include more queries show SAGAT can be more predictive than SPAM (Loft et al. 2015). The same results were shown in this study, where SAGAT queries were twice as many as SPAM.

Meanwhile, when SAGAT was limited to just one query at each freeze and sample reduction, the results showed SPAM to be more predictive than SAGAT (Durso et al. 2006). Furthermore, other studies mention that SPAM cannot predict well when workloads are high due to key task demands, so subjects ignore SPAM questions (Pierce et al. 2008). The results of this study support this question, where SPAM shows a lower predicted value than SAGAT in the 'high' scenario.

Thus, both methods have advantages and disadvantages in each of the parameters. However, SPAM is considered more suitable for SA measurement needs in pedestrian domains that tend to be simple, stable, and recurring patterns (Tong et al. 2018). SPAM predicts better performance in 'low' workload conditions. This condition follows the walking activities carried out every day without such a high workload. SPAM is also more sensitive in capturing differences in pedestrian environments. In real conditions, pedestrians are often at crossing activities or in dense traffic, which is difficult to stop. So, SPAM is more appropriate because it does not require stopping time when measuring SA.

4.1 Findings

This study's results of SA measurements in pedestrian environments yielded some new findings. First, two SA measurements are the most widely used compared to find out which probes are more effectively used in pedestrian environments, namely freeze-probing and real-time-probing. Sensitivity, intrusiveness, and predictive validity parameters were used in evaluating both probes. Second, SPAM as real-time-probing is more sensitive to changes in the pedestrian environment, but an instrusion effect on performance needs to be minimized. Third, SAGAT as freeze-probing can predict better than SPAM at higher workload conditions, while SPAM-response time is better in low workload conditions.

4.2 Limitations and Future Study

There were some limitations in the current study. First, the results were based on the experiment data from 11 participants. Larger sample size will be required to obtain more convincing results in the future. Second, future studies will need to consider other environmental factors, such as 'displays' relating to signs and regulations in pedestrian environments, to evaluate the sensitivity of both measurement methods further. Third, in evaluating intrusiveness, this study compared changes in performance with relative perspective, namely between SAGAT and SPAM experiments. The results were considered less able to identify the difference in the instruction of the two SA measurements. Future evaluation studies based on an absolute perspective are expected to complement the results of this evaluation. In addition, performance measurements need to consider the same duration between SPAM and SAGAT further to review the negative effects of differences in measurement durations, resulting in low-performance scores when measuring SA with SPAM.

SA measurements are made to capture the actual effects of the design concept of an environment. Researchers and designers should accurately capture SA levels based on appropriate measurement methods. Previous researchers often use freeze-time and online probes with different capabilities and perspectives in measuring SA. The study compared the effectiveness of the methods of both techniques in a pedestrian environment. Freeze-probing was represented by the SAGAT method, while real-time-probing was represented by SPAM. Overall, the results showed that SPAM has better sensitivity than SAGAT in a pedestrian environment. But, SPAM results in more problems with intrusiveness than SAGAT, especially the secondary task effect it causes. SAGAT and SPAM both have predictive capabilities against performance, where SAGAT shows better results under higher workload conditions in a pedestrian environment. Based on its advantages, SPAM is considered more suitable for SA measurement needs in pedestrian environments that tend to be simple, stable, and do not cause high workloads.

Previous researchers have widely conducted comparative studies to assess the effectiveness of measurements. This study also compares SA measurement methods in the domain with different characteristics from previous studies. Especially simple, stable, and recurring patterns such as pedestrian environments.

Acknowledgements

The authors would like to thank the scholarship funded by the Indonesia Endowment Fund for Education (LPDP) and the Indonesian Ministry of Finance that have supported the present study.

Declaration of interest statement

The authors declare no potential conflicts of interest concerning this research paper's research, authorship, and publication.

Alexander AL, Wickens CD (2005) Flightpath tracking, change detection and visual scanning in an integrated hazard display. Int Symp Aviat Psychol 68–72. https://doi.org/10.1177/154193120504900116
Chmielewski M, Kukiełka M, Frąszczak D, Bugajewski D (2018) Military and Crisis Management Decision Support Tools for Situation Awareness Development Using Sensor Data Fusion. Adv Intell Syst Comput 656:189–199. https://doi.org/10.1007/978-3-319-67229-8_17
Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences, Second Edi. LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS, New York
Cuevas HM, Jones RET, Mossey MaE (2011) Team and Shared Situation Awareness in Disaster Action Teams. Proc Hum Factors Ergon Soc 55th Annu Meet 365–369
Cummings ML, Guerlain S (2007) Developing operator capacity estimates for supervisory control of autonomous vehicles. Hum Factors 49:1–15. https://doi.org/10.1518/001872007779598109
Durso FT, Bleckley MK, Dattel AR (2006) Does situation awareness add to the validity of cognitive tests? Hum Factors 48:721–733. https://doi.org/10.1518/001872006779166316
Durso FT, Hackworth CA, Truitt TR, et al (1998) Situation Awareness as a Predictor of Performance for En Route Air Traffic Controllers. Air Traffic Control Q 6:1–11. https://doi.org/10.2514/atcq.6.1.1
Endsley MR (1995) Measurement of situation awareness in dynamic systems. Hum Factors 37:65–84. https://doi.org/10.1518/001872095779049499
Endsley MR (2021) A Systematic Review and Meta-Analysis of Direct Objective Measures of Situation Awareness: A Comparison of SAGAT and SPAM. Hum Factors XX:1–27. https://doi.org/DOI: 10.1177/0018720819875376
Endsley MR (1993) A Survey of Situation Awareness Requirements in Air-to-Air COmbat Fighters. Int J Aviat Psychol 3(2):157–168. https://doi.org/10.1207/s15327108ijap0302
Endsley MR, Bolte B, Jones DG (2003) Designing for Situation Awareness
Endsley MR, Garland DJ (2000) Situation Awareness Analysis and Measurement
Endsley MR, Rodgers MD (1994a) Situation Awareness Information Requirements for En Route Air Traffic Control. Final Rep
Endsley MR, Rodgers MD (1994b) Situation Awareness Information Requirements Analysis for En Route Air Traffic Control. Proc Hum Factors Ergon Soc 38th Annu Meet 71–75
Endsley MR, Selcon SJ, Hardiman TD, Croft DG (1998) Comparative analysis of SAGAT and SART for evaluations of situation awareness. Proc Hum Factors Ergon Soc 1:82–86. https://doi.org/10.1177/154193129804200119
Endsley MR, Sollenberger R, Stein E (2000a) Situation awareness: A comparison of measures. Hum Performance, Situat Aware Autom User-Centered Des New Millenn
Endsley MR, Sollenberger RL, Nakata A, Stein ES (2000b) Situation Awareness in Air Traffic Control: Enhanced Displays for Advanced Operations. Federal Aviation Administration William J. Hughes Technical Center Atlantic City International Airport, NJ
Feld JA, Plummer P (2019) Visual scanning behavior during distracted walking in healthy young adults. Gait Posture 67:219–223. https://doi.org/10.1016/j.gaitpost.2018.10.017
Field, A., 2009, Discovering Statistics Using SPSS, 3rd ed., SAGE Publication, London.
Harms IM, van Dijken JH, Brookhuis KA, de Waard D (2019) Walking without awareness. Front Psychol 10:1–12. https://doi.org/10.3389/fpsyg.2019.01846
Hogg DN, Follesoe K, Volden FS, Torralba B (1994) SACRI: A MEASURE OF SITUATION AWARENESS FOR USE IN THE EVALUATION OF NUCLEAR POWER PLANT CONTROL ROOM SYSTEMS PROVIDING INFORMATION ABOUT THE CURRENT PROCESS STATE. Proc Int At Energy Agency Spec Meet Adv Inf methods Artif Intell Nucl power plant Control rooms 26:166–174
Jones DG, Endsley MR (2004) Use of Real-Time Probes for Measuring Situation Awareness Use of Real-Time Probes for Measuring Situation Awareness. Int J Aviat Psychol 14:343–367. https://doi.org/10.1207/s15327108ijap1404
Jones DG, Endsley MR (2000) Can real-time probes provide a valid measure of situation awareness? Proc Hum Performance, Situat Aware Autom User-Centered Des New Millenn 245–250
Keeler J, Battiste H, Hallett EC, et al (2015) May I Interrupt? The effect of SPAM Probe Questions on Air Traffic Controller Performance. Procedia Manuf 3:2998–3004. https://doi.org/10.1016/j.promfg.2015.07.843
Kraemer J, Süß HM (2015) Real Time Validation of Online Situation Awareness Questionnaires in Simulated Approach Air Traffic Control. Procedia Manuf 3:3152–3159. https://doi.org/10.1016/j.promfg.2015.07.864
Lee M, Lee H, Hwang S, Choi M (2021) Understanding the impact of the walking environment on pedestrian perception and comprehension of the situation. J Transp Heal 23:101267. https://doi.org/10.1016/j.jth.2021.101267
Li WC, Horn A, Sun Z, et al (2020) Augmented visualization cues on primary flight display facilitating pilot's monitoring performance. Int J Hum Comput Stud 135:102377. https://doi.org/10.1016/j.ijhcs.2019.102377
Loft S, Bowden V, Braithwaite J, et al (2015) Situation awareness measures for simulated submarine track management. Hum Factors 57:298–310. https://doi.org/10.1177/0018720814545515
Loft S, Morrell DB, Ponton K, et al (2016) The impact of uncertain contact location on situation awareness and performance in simulated submarine track management. Hum Factors 58:1052–1068. https://doi.org/10.1177/0018720816652754
Nasar JL, Troyer D (2013) Pedestrian injuries due to mobile phone use in public places. Accid Anal Prev 57:91–95. https://doi.org/10.1016/j.aap.2013.03.021
Neuman WL (2014) Social Research Methods: Qualitative and Quantitative Approaches, Seventh Ed. Pearson Education Limited, London
Nguyen T, Lim CP, Nguyen ND, et al (2019) A Review of Situation Awareness Assessment Approaches in Aviation Environments. IEEE Syst J 13:3590–3603. https://doi.org/10.1109/JSYST.2019.2918283
Paige Bacon L, Strybel TZ (2013) Assessment of the validity and intrusiveness of online-probe questions for situation awareness in a simulated air-traffic-management task with student air-traffic controllers. Saf Sci 56:89–95. https://doi.org/10.1016/j.ssci.2012.06.019
Pielot M, Krull O, Boll S (2010) Where is my team? Supporting situation awareness with tactile displays. Conf Hum Factors Comput Syst - Proc 3:1705–1714. https://doi.org/10.1145/1753326.1753581
Pierce RS (2012) The effect of SPAM administration during a dynamic simulation. Hum Factors 54:838–848. https://doi.org/10.1177/0018720812439206
Pierce RS, Strybel TZ, Vu KPL (2008) Comparing situation awareness measurement techniques in a low fidelity air traffic control simulation. ICAS Secr - 26th Congr Int Counc Aeronaut Sci 2008, ICAS 2008 1:3525–3532
Read GJ, Salmon PA, Lenné MG (2015) An explorative analysis of pedestrian situation awareness at rail level crossings. 11p
Salmon P, Stanton N, Walker G, Green D (2006) Situation awareness measurement: A review of applicability for C4i environments. Appl Ergon 37:225–238. https://doi.org/10.1016/j.apergo.2005.02.001
Salmon PM, Stanton NA, Walker GH, et al (2008) What really is going on? Review of situation awareness models for individuals and teams. Theor Issues Ergon Sci 9:297–323. https://doi.org/10.1080/14639220701561775
Salmon PM, Stanton NA, Walker GH, et al (2009) Measuring Situation Awareness in complex systems: Comparison of measures study. Int J Ind Ergon 39:490–500. https://doi.org/10.1016/j.ergon.2008.10.010
Sharma A, Nazir S, Ernstsen J (2019) Situation awareness information requirements for maritime navigation: A goal directed task analysis. Saf Sci 120:745–752. https://doi.org/10.1016/j.ssci.2019.08.016
Shaw EP, Rietschel JC, Hendershot BD, et al (2018) Measurement of attentional reserve and mental effort for cognitive workload assessment under various task demands during dual-task walking. Biol Psychol 134:39–51. https://doi.org/10.1016/j.biopsycho.2018.01.009
Sheik-nainar M, Pankok C, Zahabi M, Kaber D (2015) Effect of Locomotion Environment Familiarity and Cognitive Loading on Gait Control and Situation Awareness in Multitasking. 1–8
Silva HI, Grigoleit T, Burress MA, Fitzpatrick D (2017) Measuring the Impact of Console Operator Experience in a Simulated Petrochemical Refining Emergency Event. 527–531. https://doi.org/10.1177/1541931213601616
Silva HI, Ziccardi J, Grigoleit T, et al (2013) Are the intrusive effects of SPAM probes present when operators differ by skill level and training? In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp 269–275
Strybel TZ, Vu KL, Kraft J, Minakata K (2008) Proceedings of the Human Factors and Ergonomics Society. Proc Hum Factors Ergon Soc 11–15
Tong Y, Jia B, Wang Y, Yang S (2018) Detecting pedestrian situation awareness in real-time: Algorithm development using heart rate variability and phone position. Proc Hum Factors Ergon Soc 3:1579–1583. https://doi.org/10.1177/1541931218621357
Tse M-K, Li SYW, Chiu TH, et al (2020) Comparison of the Effects of Automated and Manual Record Keeping on Anesthetists' Monitoring Performance: Randomized Controlled Simulation Study. JMIR Hum Factors 7:e16036. https://doi.org/10.2196/16036
Uhlarik J, Comerford DA (2002) A Review of Situation Awareness Literature Relevant to Pilot Surveillance Functions
Walker GH, Stanton NA, Young MS (2008) Feedback and driver situation awareness (SA): A comparison of SA measures and contexts. Transp Res Part F Traffic Psychol Behav 11:282–299. https://doi.org/10.1016/j.trf.2008.01.003
Wolf F, Kuber R (2018) Developing a head-mounted tactile prototype to support situational awareness. Int J Hum Comput Stud 109:54–67. https://doi.org/10.1016/j.ijhcs.2017.08.002
Zhang T, Kaber D, Hsiang S (2010) Characterization of mental models in a virtual reality-based multitasking scenario using measures of situation awareness. Theor Issues Ergon Sci 11:99–118. https://doi.org/10.1080/14639220903010027
Zhang T, Yang J, Liang N, et al (2020) Physiological Measurements of Situation Awareness: A Systematic Review. Hum Factors 1–22. https://doi.org/10.1177/0018720820969071

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Situation Awareness Measurement in Pedestrian Environments: A Comparison between Freeze-probing and Realtime-probing technique

Status:

Version 1

Abstract

Figures

1 Introduction

1.1 Related Study

1.2 The Present Study

2 Methods

2.1 Participants

2.2 Apparatus and Scenario

2.3 Experimental Procedures

2.4 SA Measurements

2.5 Performance Measurement

2.6 Statistical Analysis

3 Result

3.1 Sensitivity

3.2 Intrusiveness

3.3 Predictive validity

4 Discussion

4.1 Findings

4.2 Limitations and Future Study

5 Conclusion

Declarations

References

Additional Declarations

Status:

Version 1