Integration of autonomous vehicles (AVs) in mixed traffic environment raises the need for AVs’ coordination with other road users –especially, in traffic conflict situations, i.e. negotiating intersections– in a safe, efficient, and socially acceptable manner (Mariani et al. 2021; Vinkhuyzen and Cefkin 2016). Although there is no consensus on whether it is socially desirable for an AV to present humanlike behaviour (Li et al. 2022; Liu et al. 2019; Mueller et al. 2020), it is widely recognized that AVs should at least be able to predict the intention of other road users in order to decide its course of action (Brown and Laurier 2017; Markkula et al. 2020; Schieben et al. 2019; Schwarting et al. 2019).
To tackle this issue, various approaches have been proposed including (i) early prediction of a driver’s manoeuvre at specific road sections (i.e. intersections, roundabouts) offering sufficient prediction-window before a potential conflict (Zyner et al. 2017; 2018), (ii) modelling of vehicles’ behaviour using game theory in an attempt to build decision matrices that will enable an AV to find a solution to safely cross an intersection (Doniec et al. 2008; Fox et al. 2018), (iii) estimation of the social value orientation (SVO) of other drivers from observable trajectories enabling an AV to predict their actions and adjust their movements (Schwarting et al. 2019; Zhao et al. 2021), or even (iv) explicit communication between neighbouring vehicles enabling them to come to an agreement to avoid an impeding conflict (e.g. crossing intersection) through argumentation (Lippi et al. 2018). Each approach raises specific issues such as: ill-matching of maneuver-based models in various road environments (Toghi et al. 2020); hiding the essential dynamics in a real world scenario due to the way a model quantizes time and uses local information (Fox et al. 2018; Mariani et al. 2021); erroneous attribution of social intent to other drivers (Pletzer et al. 2018); mistrusting other drivers and/or disagreements resulting to time-delay of overall traffic flow (Lamas et al. 2015). Besides the above issues pertinent for each approach, other authors (Alahi et al. 2016; Toghi et al. 2021) note that a key shortcoming in all, is their common focus on dyadic interactions between road users in close proximity to each other without considering how their behaviours may be affected by other road users in their immediate surroundings.
However, there is evidence that inclusion of surrounding agents’ behaviour can significantly improve prediction in proximal interactions. For instance, Alahi et al. (2016) introduced a social long short-term memory network to incorporate the subtle interactions that are taking place among pedestrians moving in dense crowds. They showed that pooling the interactions of neighbouring moving pedestrians in crowded spaces arising from social norms made possible to predict complex non-linear behaviours.
Still, the role of surrounding road users in shaping the coordination between two interacting ones in close proximity is an issue that has received little attention even in widely accepted driving behaviour models (e.g. Hollnagel et al. 2003; Michon 1985, see also for a review Carsten 2007; Panou et al. 2007).
As Renner and Johansson (2006) noted, applying a single-driver perspective offers only a limited understanding on how drivers coordinate their driving in a traffic situation with multiple road users. To this end, they proposed a model based on the interpredictability of participants’ attitudes and actions derived from the concept of joint-activity developed by Clark (1996) and further elaborated by Klein et al. (2005).
According to Klein et al. (ibid.) the two basic requirements for effective coordination are, one the one hand, a level of commitment (tacit) of all parties to support the process of coordination, the so-called Basic Compact, and on the other hand, a level of confidence that use of abbreviated forms of communication will not end to coordination failures, the so-called Common Ground. Sustenance of common ground is described in three stages: (i) initial common ground, referring both to participants’ knowledge of formal rules and all conventions associated with the particular joint task as well as their shared scripts about the expected behaviour of the other parties, (ii) public events, referring to participants’ knowledge of the event history that may bring to the fore possible options that may have been lost during the course of joint activity and, finally, (iii) current state of the activity, referring to cues provided in the physical scene enabling participants to predict subsequent actions and formulate appropriate coordination.
The above concepts can easily be grasped in traffic situations demanding coordination between AVs and human drivers. By using public available videos of AVs, Brown and Laurier (2017) presented a number of traffic episodes in which non-adherence to conventions and associated scripts by an AV led to a fundamental breakdown of common ground with other parties involved. For example, in a cross intersection episode, the edging forward of an AV at a four-way intersection, acted as a signal to other drivers of its intention to take its ‘slot’ to cross. However, absence of subsequent movement (e.g. creeping) was interpreted by other drivers as a signal of the AV aborting its initial intention for the moment, leaving the ‘slot’ to be taken by another car. Consequently the ‘allowance’ of the first car to take the ‘slot’ was interpreted by the driver of the car following the crossing one as a sign that the AV will remain standstill. But, by the moment the second car was crossing the intersection the AV also moved on, resulting to abrupt braking of both the AV and another car behind it.
The episode above vividly shows that initial common ground (i.e. shared scripts) among the parties involved in a joint activity is neither static nor monolithic. Depending on the public events (e.g. momentary inertia of AV) the shared scripts are continuously re-adapted or, worse, diverging scripts may arise among participants, with different expectations about the imminent actions of others. Accordingly, the situation created by the coordination between two imminent road users (current state of activity) although less informative for sustaining coordination between them, provides cues to surrounding road users of emerging scripts and possible opportunities to mingle in. Such mingling will be realized if judged socially acceptable. As Deppermann (2019) and Laurier (2019) note, the evolution of coordination sets the obligation of participants to the moral order of the unfolding event, e.g. trusting the signs of yielding behaviour by AV, appreciation from the drivers’ being allowed to cross before AV, and recognizing when the sacrifice of a road user’s right is abused by others.
Until now, most studies examining human road users’ decision making at intersections mainly focus on factors contributing to coordination failures, i.e. improper lookout, age of drivers, and driving inexperience (Bao and Boyle 2009; Herslund and Jørgensen 2003; Summala et al. 1996; Xu et al. 2014). Other studies focus on drivers’ behavioural patterns at intersections, reflecting drivers’ risky decision making. For instance, drivers’ partial compliance with stop-sign regulation, resulting to the prevalence of a rolling stop instead of a full stop (Feest 1968; Lebbon et al. 2007; McKelvie 1987;1986; Retting et al. 2003; Wen et al. 2021), or drivers’ duels for priority based on informal rules i.e. a ‘first come, first served’ tendency, yielding to those drivers that maintain their speed and/or come from a road with broader breadth (Björklund and Åberg 2005; De Ceunynck et al. 2013).
As yet, the pervasive role of third road user(s) between two negotiating ones in cross intersection episodes has not been explicitly studied. Human drivers readily recognize situational constrains affecting other road users’ course of action and often exploit them to their advantage. Therefore, integration of such situational constraints and/or opportunities to AVs’ prediction models may be pivotal for their deployment in mixed traffic. In this study, analysis of crossing episodes between two intersecting vehicles in which a third road user clearly affected its evolution was conducted. The main objectives were first, to reveal recurring patterns where the right-of-way is altered due to the presence of a third road user and second, to identify traffic situational invariances that eventually can be used by AVs to predict and comply to such patterns.