Predicting Human Exchange Decision-Making with Theoretically Informed Data and Machine Learning

doi:10.21203/rs.3.rs-2464759/v1

Artificial agents that can predict human decisions in social exchange contexts can potentially help to facilitate cooperation and promote prosocial behaviours. Modelling human decision-making is difficult in social exchange contexts where multiple contending motives inform decisions in rapidly evolving situations. We propose a mixed Theory and Data-Driven (TD2) model that is comprised of three modules: (1) a clustering algorithm that identifies strategies in interactive social exchange contexts (2) an artificial neural network that classifies an exchange decision into one of the identified strategies based on empirically defined motives and the observable differences during social exchanges, and (3) a hidden Markov model that predicts situated human decisions based on the strategies applied by humans over time. The TD2 decision-making model was trained and tested using 7,840 exchange data from "minimal group" experimental exchange games in which decisions were motivated by group ties, wealth aspiration, and interpersonal ties. The model was able to classify behaviours with 95% accuracy. Reciprocity, fairness and in-group favouritism were predicted, as separate decisions, with accuracies of 81%, 57% and 71% respectively. The performance of the model improved over time. Future work will evaluate the model in a live experiment involving Human-Agent Cooperation (HAC).

Physical sciences/Mathematics and computing/Computer science

Biological sciences/Psychology/Human behaviour

Social exchange decision-making

human-agent cooperation

HAC

Virtual Interaction Application

hidden Markov model

In the world where artificial agents have pervaded society, the need to improve Human-Agent Cooperation has increased. By predicting human exchange decisions in interactive social exchanges, artificial agents can potentially help to facilitate cooperation and promote prosocial behaviours. Recent studies e.g., ¹ have used strategic games to investigate the effects of agents in social interaction and cooperation. Whereas strategic games such as the Prisoner’s Dilemma and GO (or Weiqi) have optimal outcomes, interactive social exchanges like gift giving (1) have multiple obscure motives such as fairness, selfishness and in-group favouritism, (2) whose relative importance evolves through interaction, and (3) in situations that present multiple, overlapping behavioural demands that are poorly differentiated from each other.² Think of the multiple obscure motives that underly giving your boss a gift. While decision-making can be easily predicted when motives are known with certainty ^3–5, the obscure motives in interactive social exchanges provide indefinite cues for predicting behaviour. We propose a method for predicting decision-making in exchange environments as part of a larger initiative to develop adaptive agents capable of cooperating with humans in social exchange. Specifically, this study develops a model that can infer players' strategies during interactive social exchanges and predict exchange decisions based on these strategies.

Exchange decisions were studied in the context of a simple experimental game ⁶ in which members of two 7-player groups were required to allocate a single token to any player in each of 40 rounds. They had to choose who to give it to, keeping in mind that a player’s token represents the player’s wealth. Players’ token balances indicate how rich or poor they are relative to others at each round of the game. We define an exchange decision in terms of three identified definite behaviours: (1) reciprocating a gift in the next round, (2) giving to a rich player (seeking power) versus a poor player (fairness) and (3) out-group versus in-group giving. Each exchange decision was defined as the presence or absence of each definite behaviour, represented as three-digit binary combinations in the order defined above. For example, 001 indicates an allocation that was not an act of reciprocation made to a poor out-group player, whereas 111 is a reciprocated allocation to a rich out-group player.

Individuals in interactive social exchange often make exchange decisions as part of a strategy that supports their motives. A strategy was defined as a complex plan that guides an individual’s exchange decisions such that each exchange decision conforms to the plan. For example, individuals whose motives are to strengthen in-group ties often allocate their tokens to in-group members as a strategy. An intrinsic complexity of interactive social exchange is that the same strategy can lead to several exchange decisions and several strategies can lead to the same exchange decision. For example, allocating tokens to in-group members may not be motivated by ingroup altruism but by self-enrichment, effected by building trust and eliciting reciprocation. This study develops a Theory and Data-Driven (TD2) model to simplify and enhance predictive accuracy in challenging contexts of obscure, overlapping and evolving motives. We tackle the complications by identifying definite behaviours and then seeing how these are used in combination and stack up over time to become part of complex strategies. The strategies then influence the exchange decisions individuals will make in a particular context.

The contribution of the paper is threefold. First, it provides a novel method of integrating theory into machine learning models via theory-driven data. Second, it demonstrates how theoretically grounded motives can be used to infer strategies and predict exchange decisions. Third, it contributes to the research on improving models of decision-making in Human-Agent Cooperation (HAC).

Decision-Making Models of Social Exchanges

Previous models of decision-making relied on either rule-based ^7,8, theory-based ^9,10 or data-driven ¹¹ algorithms. The if-then rules proposed by Nilsson ⁷ consist of three basic components which are: (1) a set of rules, (2) a knowledge warehouse where information (e.g., the rule precedence) relevant to the problem domain is stored, and (3) a rule interpreter which determines the line of action that the actors will perform in a specific context. A general criticism that can be made of rule-based models is that the actors follow predefined rules and the model does not learn from the past ¹²; these models are not suitable for dynamic environments such as social exchange settings where information from the past exchange shapes the current decision.

Conversely, most theory-based models e.g., ¹⁰ represent decision-making as a set of probabilities. Enayat, et al. ¹⁰ developed a theory-based computational model of social exchange as a set of probabilities based on Homans’ propositions ^13,14: success, value, stimuli, deprivation-satiation, aggregation-approval, and rationality. For example, the probability of performing an action is proportional to the value of the reward obtained from performing that action previously (value proposition). Although this approach showed how feedback (adjustment of behaviour based on experience) can inform decision-making, a key limitation is its inability to consider how the emergent structure or the state of the system affected the interaction among actors. For example, Enayat, et al. ¹⁰ reported that the absence of emotional factors, such as feelings and anger, causes less group distinction. Such distinctions are emergent states of the system which could impact the actor’s choice during transactions but were not considered during the model formation. Also, a general problem with theory-based computational models is that mathematical representations of motives rely on assumptions that may not represent actual motives. Rather than using mathematical representations, we deduce possible motives from theoretical knowledge, represent these motives as data which feeds into the model and allow individual motives to change over time due to the stochastic nature of interactive social exchange.

Data-driven models are suited to representing the highly stochastic and dynamic ^15,16 nature of human decision-making. Kavak and colleagues¹¹ have provided a template for using data from human experiments to inform and calibrate models, thereby focusing on the properties and behavioural patterns of the humans they represent. While feeding data into computational models has existed for decades, it has become popular with the increasing use of agent-based models and machine learning. Many studies ^17–19 have integrated machine learning into agent-based models for modelling decision-making. For example, Augustijn, et al. ¹⁹ used machine learning to generate decision rules from data. Although this method ensures that the decision rules represent the actual decision-making model of the individuals being considered, it relies on the availability of a large dataset that represents a variety of behaviours in the context. There is no guarantee that the decisions will be well captured in the rule if the dataset is small. Care must be taken to ensure the quality of a small dataset to improve decision rules that may be generated from the dataset. A quality dataset (i.e., containing enough information necessary to capture decision patterns) is vital for accurate or near-accurate prediction of human decision-making.

Given that a large dataset is not available in the interactive social exchange context considered in this study, we ensure the quality of the available dataset by combining machine learning and Edmonds and Moss ²⁰ idea of ‘Keep it descriptive, stupid’ (KIDS). The idea is that models should be as descriptive as possible. This study applies KIDS by applying a theoretical framework for describing and understanding the possible motives of individuals in interactive social exchanges. Guided by the theoretical framework, we generate data that were not physically observable during the interactive social exchanges to improve the quality of the available dataset.

This section presents the proposed model, the configurations, and the training and evaluation of the model. The conceptual model represented in Fig. 1 shows the relationships between the key concepts in our proposed model. We discuss how these concepts were operationalized in our model.

Figure 1 shows how each stage in the model feeds into another. It also shows how the exchange decisions feedback into the model to inform subsequent decisions. 1 and 2 are collectively termed the learning phase while 3 and 4 are collectively termed the prediction phase. In the learning phase, we theoretically identified motives (internal states) for exchange decisions. The combinations of the observable differences between individuals and definite behaviours that signify the motives of the individuals are the observed behaviours in this phase. Subsequently, cluster analyses of the observed behaviours (of the learning phase) were used to impute unobserved game strategies (internal states of the prediction phase) to predict exchange decisions (observable behaviour of the prediction phase). Further explanations are provided in subsections. The material (code, data and analyses) for this model can be found via the link: https://tinyurl.com/bdd46utb.

Social Exchange Data and Exchange Decisions

The proposed model aims to predict human exchange decisions during interactive social exchanges. The model was trained and tested using interactive social exchange data and theoretical knowledge of possible motives underlying social exchange decisions. The data was obtained from Virtual Interaction Application (VIAPPL) 2013 and 2014 experiments reported by Durrheim, et al. ⁶. In the experiments, individuals were required to allocate a single token per round to any player in the two 7-player groups. The selection was based on the observable differences (e.g., token balance, group identity, and previous allocations) that differentiate the players.

An individual’s token balance in each round shows how wealthy the individual was in that round. Thus, a token balance was regarded as a show of the individual’s status. An individual’s status was measured relative to the wealth of the other individuals in the game, i.e., we measure how wealthy an individual is in comparison to other individuals. Status was determined by the formula \(Status = a/max\left(A\right),\)where \(a\) is the token balance of an individual in a round, and \(A\) is a vector of the token balances of all the players in that round. Thus, status is represented as a real number in the range of 0 to 1 inclusive. An individual is of high status if the individual’s status is greater than the average status in the round; otherwise, the individual is of low status. Group identity simply indicates the group to which the individual belongs. The group identities are represented as 1 and 2 for group 1 and group 2 respectively. Previous allocation indicates an in-group giving, out-group giving or self-giving in the previous round, and is represented as 0, 1 or 2 respectively.

The experimental games were randomly assigned to conditions in which players and groups started with either equal or unequal token balances. The data for each allocation in each round records the game identifier, the group number, experimental condition, starting and ending token balance, and directed ties showing player to player allocations. These ties provide traces of relational interdependencies (e.g., competition and cooperation) that develop between interacting individuals and groups. The unprocessed data contains 2 (conditions) x 40 (rounds) x 14 (participants) x 5 (games) of VIAPPL 2013 data and 1 (condition) x 40 (rounds) x 14 (participants) x 4 (games) of VIAPPL 2014 data reported in ⁶. This implies 7,840 exchanges. To reduce noise in the data, the first rounds of the data were not used because players were likely to randomly allocate their tokens.

Each exchange shows the presence or absence of each definite behaviour: reciprocity, defined as giving the player who gave you in the previous round, vs non-reciprocity, (2) giving a rich vs giving a poor player and (3) giving out-group vs giving in-group. Table 1 presents the exchange decisions and their explanations as used in the current study. An act of reciprocation is indicated as 1 while its absence is indicated as 0. Giving a rich player (1) and giving out-group (1).

Motives

This section shows how definite behaviours help to signify the motives underlying exchange decisions. The theoretical framework in Fig. 2 identifies three motives, namely, group ties, wealth aspiration, and interpersonal ties, that are likely to inform exchange behaviours. These motives, inferred from previous receipts (in the case of ties) and previous allocations (in the case of wealth aspiration), are the theoretically identified motives that account for the absence or presence of the definite behaviours in exchange decisions. The strength of each motive is measured from past observed behaviour and is used to impute strategies which in turn, is used to predict future exchange decisions. For example, an individual who has a strong motive for fairness in the game will often give a token to the poorest player (definite behaviour), irrespective of the poor player’s group.

Group ties measure the relationship between individuals and groups. Individuals and groups adopt various identity management strategies aimed at creating positive social identities ^21,22. Social identity theory ²³ provides a theoretical lens to understand the social psychological motives of social exchanges. The theory argues that individuals categorise as group members in intergroup contexts²², compare the status of in-group and outgroup, and are motivated to identify with groups that are positively valued within the hierarchy ²⁴. These processes occur as the individual seeks to achieve a positive internal perception of self or a high perception of self-worth ²⁵. Prejudice and discrimination occur as an expression of this positive distinctiveness motive, which is expressed as in-group favouritism behaviour or parochial altruism intergroup exchange experiments ^26,27.

Bounded generalised reciprocity (BGR) theory ^28,29 argues that parochial altruism is ultimately motivated by self-interest. Individuals favour in-group members because there is an expectation to do so. To avoid acquiring a bad reputation and being excluded from exchange network, individuals will favour the in-group compared to the out-group. In other words, individuals expect profitable and advantageous interactions with in-group members because of the expectations that favours are more likely to be reciprocated by in-group members compared to the out-group members ^26,28.

Both SIT and BGR expect that in-group favouring behaviour will strengthen the bond between the in-group members. We refer to this bond as group tie. An individual motive may be to strengthen the bond with the in-group (or even with the out-group), to maintain a good reputation. Therefore, group ties were defined in terms of in-group versus out-group exchange.

For each player, we measure the strength of each (in-group and out-group) tie by determining how often the in-group and out-group members allocate tokens to the individual (see Table 2). We determine in-group relationships by the ratio (\({\text{R}}_{\text{r}}\)) of the number of tokens received from in-group members (\({\text{T}}_{\text{i}\text{n}}\)) to the number of in-group members (\({\text{N}}_{\text{i}\text{n}}\)) in the round (\(\text{r})\), given that the game rules specify one token allocation per round. Thus, \({\text{T}}_{\text{i}\text{n}}= {\text{N}}_{\text{i}\text{n}}\) will result in \({\text{R}}_{\text{r}}=1\) (a very strong bond), while \({\text{T}}_{\text{i}\text{n}}= 0\) will result in \({\text{R}}_{\text{r}}=0\) (no bond). Out-group relationship, i.e., the ratio of the number of tokens received from out-group members to the number of out-group members, is calculated in the same way as the in-group relationship, with \({\text{T}}_{\text{i}\text{n}}\) replaced by \({\text{T}}_{\text{o}\text{u}\text{t}}\) and \({\text{N}}_{\text{i}\text{n}}\) replaced by \({\text{N}}_{\text{o}\text{u}\text{t}}\). An individual may have a strong bond with both the in-group and out-group members. Thus, in-group and out-group relationships are two separate measures.

In most cases, the comparison between in-group and out-group is made relative to status (low/high/equal status), which is associated with power. Wealthy individuals are often seen as powerful individuals. Thus, wealth aspiration underpins two exchange behaviours: giving to the rich (or seeking power) and giving to the poor (or fairness).

Capraro, et al. ⁴ suggested that fairness can be the basis on which some individuals interact. According to Tajfel and Turner ²⁷, interaction may be moderated by the legitimacy (perceptions of fairness) of the status hierarchy. When perceptions of fairness are low, low-status group members challenge the status quo by strengthening in-group reciprocal behaviours and in-group favouritism. High status group members may either enter into intergroup competition or rectify the injustice by outgroup altruism, giving to the poor. In contrast, when the situation is viewed as legitimate, low status group members may seek to enrich themselves individually by making exchanges with rich outgroup members.

Wealth aspiration was measured relative to the wealth of the player to whom the individual allocates a token, that is, the individual's aspiration to associate with the poor (fairness) or the rich (power-seeking). Associating with the poor means allocating one’s token to a low-status individual while associating with the rich means allocating one’s token to a high status individual in the round. The former is considered as being fair, while the latter is considered as seeking power. Thus, participants’ wealth aspiration in round \(r\) is calculated based on the allocations made in round \(r-1\), using the formula \(WealthAspiration = {a}_{r-1} /max\left({A}_{r-1}\right),\)where \({a}_{r-1}\) is the start token (individual’s token before allocation) of the player to whom an individual allocated a token in round \(r-1\), and \({A}_{r-1}\)is a vector of the start tokens of all the players in round\(r -1.\)

Interpersonal ties of trust are built by reciprocity. Relationships between individuals are more trusting when exchanges occur without explicit negotiations between members ^30,31. Trustworthy individuals gain positive reputations; they are likely to be rewarded by other individuals ³², and their actions are more likely to be reciprocated, especially by the in-group members. As shown by De Dreu, et al. ³³, expectations of reciprocity can promote in-group interactions, and reciprocation. But powerful reciprocity norms also motivate reciprocation with individuals who are not in-group members.

Interpersonal ties were measured in terms of reciprocation motive, A’s desire to allocate a token to another participant B who allocated a token to A in the previous round. This motive was measured by the history of reciprocity in terms of the presence or absence of reciprocity in the previous round indicated as 1 and 0 respectively.

Motives and experience (represented as observed behaviours) form the basis on which individuals in social interaction plan and/or adjust their plans. We refer to the combination of motives and observable differences between individuals as features used to impute strategies in social interaction. In this study, features are group ties, wealth aspiration, reciprocity, status, group identity, and previous allocation. These features form input to the cluster analysis used to determine strategies in the game. Table 3 summarises the features while Table 4 shows the representation of the features as input to the cluster analysis.

Strategies

This section identifies unobserved strategies in the minds of players. It does this by finding patterns of association between features of exchange behaviours of each player over the course of the game. Note that each row in Table 4 represents a single exchange decision, by one player in one round. The actual decision, recorded in the three-character depiction (see Table 1) is recorded in the final column. The “Features” columns record observable features that characterize the motives and game features of the decision. We use cluster analysis to identify and categorize patterns of exchanged behaviours based on the co-occurrence of features. These categories of exchange behaviours represent unseen and implicit strategies.

As shown by Zaki, et al. ³⁴, clustering behaviours can ensure high performance in predicting exchange decisions. Partitioning around medoids (PAM) clustering algorithm in R ³⁵ was applied with Gower distance ³⁶ as the distance measure. Although other distance measures, such as Euclidean and Manhattan, can be used see ³⁷, Gower distance is very useful and performs well in a domain with mixed data types – categorical and non-categorical data ^36,38.

The silhouette width also referred to as the silhouette coefficient ³⁹, was used to determine the optimal number of clusters. It measures the within-cluster cohesion and the separation distance between clusters. The silhouette width of a data sample ranges from − 1 to 1, where large \(s\) (near 1) implies well-clustered, a small s (near 0) implies that the data sample lies between clusters, and a negative \(s\)implies that the data sample has been placed in the wrong cluster. Thus, the higher the silhouette width, the better the cluster.

The clustering procedure and results

The optimal number of clusters was determined experimentally by obtaining the silhouette width for two to 20 clusters. Figure 3 plots the number of clusters on the x-axis against the silhouette width on the y-axis. The optimal number of clusters is indicated by the highest silhouette width. The plot shows that the optimal number of clusters is six, with a silhouette width of 0.685.

We interpret the clusters by identifying the two dominant exchange decisions that characterize each one (see Table 1). Figure 4 plots the stacked bar charts of these exchange decisions for each cluster, which show the dominant and recessive decisions that characterize each strategy. Table 5 reports the two dominant decisions for each cluster and interprets the strategy represented by this cluster. For example, Cluster 1 is represented by Ingroup-Care strategy (individuals allocate their tokens to in-group members irrespective of their status) while Cluster 6 is represented by Ingroup-Promotion (individuals allocate their tokens and reciprocate only to the poor in-group member).

Predicting Strategies via Machine Learning and results

Whereas the cluster analysis identifies the strategies on the basis of all the data, our ultimate objective was to predict the strategy that motivated a single exchange behaviour. To this end, we trained an artificial neural network (ANN) ^40–42 to predict the cluster membership of each exchange in each round by each player and represent the strategy of the player as that depicted by the predicted cluster. Input to the ANN are Features (see Table 4) generated, including the exchange decision, based on an individual’s previous round while the output is the cluster to which the past exchange belongs. The ANN was designed to operate in real-time to classify a single exchange decision into one of the identified strategies or as a newly formed strategy. Thus, we use an artificial neural network to compute the probability that an exchange decision in the past round forms part of a complex strategy. Where the probability is below a given threshold (95%, in this study), the decision is categorised as part of a new strategy. Recognising the strategy of an individual during interactive social exchanges will improve the prediction of the individual’s exchange decision. This capability has been shown to work in other domains such as traffic congestion prediction ³⁴. The ANN was implemented using the \(Deeplearning4j\) framework ⁴³. The study used a feed-forward artificial neural network with three layers – an input layer, a hidden layer with four neurons, and an output layer with six neurons, one for each cluster. The final artificial neural network model was trained using a batch size of 40 and a learning rate of 0.01, with 15 epochs. It makes use of the softmax activation and the NegativeLogLikelihood function in Deeplearning4j ⁴³ for computing the error which is used to determine the direction of learning. These parameters were determined experimentally.

To train the artificial neural network, data (i.e., features and their corresponding strategies discovered by the cluster analysis) were divided into training and testing sets, each having X (the features) and Y (the strategy) components. Of the data, 70% were used for training while the remaining 30% were used for testing the artificial neural network. Both X and Y were provided to the neural network during training, but only X was provided during testing. The function of the neural network is then to classify X into one of the available clusters, irrespective of the round at which X is produced.

The artificial neural network was evaluated using the accuracy score. This simply counts the number of samples correctly classified. However, accuracy is not a true reflection when the number of samples in each class is not equal or not almost equal (imbalance dataset). To ensure a more accurate measure, the multi-class confusion matrix, detailed in ⁴⁴, was used. Precision, recall and F1 scores were calculated from the confusion matrix. Precision measures the actual number of samples belonging to a class among the total number of samples the artificial neural network identified as belonging to the class. The value ranges from 0 or 0% (no identification) to 1 or 100% (perfect identification). Recall measures the artificial neural network’s ability to discriminate samples that do not belong to a particular class. Again, the value ranges from 0 or 0% (no discrimination) to 1 or 100% (perfect discrimination). F1-score – measures the balance between precision and recall. It ranges from 0 to 1. A higher value indicates a better score.

We present the result of predicting the strategies applied by individuals based on the previous exchange decisions. Figure 5 shows the learning curve for the artificial neural network.

Tables 6 and 7 show the performance measures obtained in one of the experiments with the number of epochs set to 5 (all other parameters remained as reported). Tables 6 and 7 show the confusion matrix and the performance table respectively for the test set. The performance statistics show that the neural network predicted the strategies with high accuracy of above 94%. This result is similar to that obtained for the training set. The F1 scores of 90% (see the supplementary material) and 89% on the training and test set, respectively, confirm that the performance is not biased towards any cluster. The final parameter was obtained when the number of epochs was set to 15. The confusion matrices generated internally by the Deeplearning4j ⁴³ inbuilt function for the training and test dataset are provided in the supplementary materials https://tinyurl.com/bdd46utb.

Predicting future moves via a Hidden Markov Model

The ANN is trained to identify the exchange strategy that a single exchange behaviour belongs to. It can take all the exchange decisions enacted by the player at round r and classify them into various strategies. We now develop a hidden Markov model to predict what each player will do in the next round, r + 1. The main aim of the hidden Markov model is to predict future moves from a player's past behaviour. Rather than taking the player's past behaviour as input in the form of Features, the hidden Markov model takes the player's past strategies as input and predicts the player's next exchange decision. This was done to improve the predictive accuracy of the model.

A hidden Markov model has hidden states on which the observables are conditioned. For example, an altruistic act can be motivated by empathic concern ⁴⁵. An altruistic act is an observation while empathy is the state on which the act is conditioned. See ⁴⁶ for a recent review.

A hidden Markov model process ⁴⁷ is characterised by five tuples \(\{Q,O,\pi , A,B\}\).

\(Q\) = {\({q}_{1}, {q}_{2}, {q}_{3}, \dots {q}_{T}\}\) is the set of states, each one drawn from N number of possible states, where \({q}_{t}\) denotes the state at time \(for 1\le t\le T\), and \(T\) is the maximum number of times an observation was made.
\(O\) = {\({O}_{1}, {O}_{2}, {O}_{3}, \dots {O}_{T}\}\) is the set of observations, each one drawn from \(M\) number of possible observations.
\(\pi\) = \({\{\pi }_{1}, {\pi }_{2}, {\pi }_{3}, \dots {\pi }_{N}\}\) is the initial state probability for each \(q\) in the set of all possible states.
\(A=\left\{{a}_{ij}\right\}\) is the transition probability. This describes the probability of moving from state \(i= {q}_{t-1}\) to state \(j= {q}_{t}\)
\(B\) = {\({b}_{j }\left({O}_{t}\right)\}\) is the emission probability. This denotes the probability of observation at time \(t\), given state \(j= {q}_{t}\)
\(\lambda =\{\pi , A, B\}\) is the parameters of the hidden Markov model.

For each participant in the VIAPPL experiments, a hidden Markov model takes the previous strategies \(S\) as states and the previous exchange decisions \(O\) as observations, as shown in Fig. 6. It then predicts the next exchange decision of the participant. Thus, we used a time-homogenous hidden Markov model that uses round-forward chaining time-series cross-validation for training and testing. As shown in Table 8, round-forward chaining starts by using data from rounds 1 to \(r\) to train the hidden Markov model, which is tested by predicting the exchange decisions in round\(r +1\). Next, it includes the prediction from round \(r +1\) in the training set and predicts round \(r + 2\). This process continues until the last round is predicted. The hidden Markov model is retrained after each round of the game and the transition and emission probabilities change per round. This process was implemented to accommodate changes over time in individuals' strategies.

In each round of the forward chaining time-series cross-validation, the hidden Markov model is trained using the Baum-Welch expectation-maximisation algorithm described in the seminal work of Rabiner ⁴⁷. Given the exchange decisions and the strategies, training the hidden Markov model implies finding the parameters that would make the exchange decisions most likely. This is also known as parameter estimation.

Using sample observations from five participants, Table 9 shows the hidden Markov model evaluation. The hidden Markov model is evaluated using the average accuracy score per round in the round-forward chaining. That is, for each participant in each round, the hidden Markov model predicts the exchange decision (in the test set) of the participant. The average accuracy score per round is the number of exchange decisions correctly predicted divided by the number of participants in that round. The hidden Markov model is also evaluated on its accuracy in predicting the actions that make up the exchange decisions. The individual and combined evaluation are crucial for the application of the model, as it presents the opportunity to plan interventions based on one or more actions during interactive social exchanges.

The data is imbalanced, meaning that certain behaviour occurs less often than others. For example, out-group allocation occurs less often than in-group allocation. For this reason, sensitivity and specificity scores are included to measure the performance of the model more accurately. The sensitivity of the model is the percentage of the definite behaviours that are predicted as present among those that are truly present, whereas the specificity of the model is the percentage of the definite behaviour that are predicted as absent among those that are truly absent ⁴⁸.

This section reports on the hidden Markov model performance in predicting a player’s exchange decisions given the set of strategies previously applied by the player. Figure 7 a. and b. plot the learning curves for the transition and emission probabilities of the hidden Markov model in the round-forward chaining cross-validation for rounds 5, 10, 20, 30 and 35. These rounds were chosen to show the differences in the convergence as the data used for training increases. The graph shows that the more rounds used for training, the better and faster the algorithm learns.

As shown in Fig. 8 a., the accuracy of the model on each definite behaviour is better than its accuracy on the exchange decision when treated as one output, i.e., when all the three definite behaviours in an exchange decision are correctly predicted. Reciprocity was predicted with higher accuracy than the others. However, the prediction accuracy decreased over time because reciprocity was very common and very easy to guess. As the model becomes more sensitive to other features, the initial accuracy declines slightly. The model accuracy on seeking power versus fairness fluctuates over time. However, the accuracy increases over time for out-group versus in-group favouritism. Overall, the model becomes more accurate over time. Compared to random guesses (Fig. 8 b.), the model accuracy on exchange decisions reached up to 40% while random guesses of exchange decisions stayed below 15% accuracy. Also, the accuracy of the model on reciprocity always stayed above 80% while random guesses of reciprocity were 50% accurate. Thus, the model performed very well.

Since model accuracy is not sufficient to draw a conclusion about the performance of the model. Figure 9 shows the sensitivity and specificity graphs. Overall, the specificity of the model is higher than its sensitivity, showing that the model is more confident in predicting the absence of a definite behaviour than the presence of the definite behaviour. For example, the model will predict, with more accuracy, when a player will not reciprocate than when a player will reciprocate. However, there is a gradual increase in sensitivity over rounds. This implies that the model’s ability to predict the presence of a definite behaviour (i.e., identify participants that may reciprocate or allocate their tokens to out-group members) increases with an increase in the round number. Thus, more data improves the performance of the model.

This study aimed to develop a model for predicting social exchange decisions. There are many competing theories concerning the factors that contribute to decision-making in an interactive social exchange. Social identity theory (SIT) ²³, bounded generalised reciprocity (BGR) ^28,29, and work on interdependencies ^33,49 provide an understanding of how social context and social relationships impact the exchange decisions at both the individual and group levels. Based on these theories, this study developed a model that leverages the combination of technological advancements, theoretical knowledge and experimental data to predict interactive social exchange decisions. The model is discussed in two phases: the learning phase and the prediction phase.

The Learning Phase

In the learning phase, clustering was performed to discover strategies applied by players. The cluster performance indicated six clusters as optimal for the data provided. Indeed, this may not always be the case, as the potential exists for new strategies to emerge during interactions over a longer period. Moreover, the strategy was determined by the dominant exchange decision behaviours, which could change over time.

The neural network was trained to classify exchange data into one of the strategies. The result shows that this is an easy task for the neural network, as the final model obtained very high accuracy with one hidden layer. To show that the accuracy score is a true reflection of the neural network, the latter was also evaluated and investigated with three different metrics – precision, recall and F1-score. Each of these metrics indicates that the performance of the artificial neural networks is very good. This is also an indication that the correct number of clusters was chosen.

The Prediction Phase

The predictive performance of the model was based on accuracy scores (the ability to correctly predict the presence or absence of a definite behaviour), sensitivity (the percentage of the definite behaviours that are predicted as present among those that are truly present) and specificity (the percentage of the definite behaviour that is predicted as absent among those that are truly absent). Despite the accuracy of the neural network, the prediction of the exchange decision was not an easy task. The result shows that the model performed above average (i.e., above 50% on predicting fairness, above 60% on predicting favouritism, and above 80% on predicting reciprocity) on predicting the definite behaviours but lower on the exchange decisions (a combination of these behaviours as one output). This was anticipated because of the stochastic nature of interactive social exchanges. However, the gradual increase in the accuracy of the exchange decisions shows that the performance is likely to increase during an interaction over a longer period. The performance of the model is considered good, especially when compared to a random guess which produced less than 15% accuracy on exchange decision.

Although the accuracy of the model is highest in predicting reciprocity, the sensitivity and specificity graphs (Fig. 9) show that the high accuracy after the first ten rounds was not a true reflection of the goodness of fit. The graph suggests that the output of the model is mostly 0 after the first ten rounds, while there were a few reciprocal exchanges. This leads to a very high specificity of above 0.6 (Fig. 9 Left) but low sensitivity (Fig. 9 Right). However, the sensitivity increased over time, which shows that the model improves over time.

The model performance on predicting token allocation to a rich or poor player did not improve over time. The study concludes that this performance was a result of status (poor or rich) changing frequently during the game from which the data was collected. However, the performance of the model on out-group vs in-group favouritism increased over time.

Limitations and Future work

Like most laboratory experiments, VIAPPL allows interaction in a very minimal context. Thus, the range of possible behaviours that can be learned is limited. Therefore, the model may not be generalisable to real-world contexts involving a higher degree of complexity. Also, due to the stochastic nature of the hidden Markov model, there is no guarantee that the performance of the model will remain the same in different settings. Finally, as with other artificial neural network-based models, a change in the model parameter, for example, the learning rate, and the number of iterations, may lead to a different outcome in the prediction of the strategies. This may lead to a change in the prediction accuracy of the model.

Future work will apply the model in a live experiment to test the performance of the co-evolution of the artificial neural networks and the hidden Markov model.

Understanding how human decisions are influenced by motives and actions is important for predicting human social exchange decision-making. By predicting human exchange decisions in interactive social exchanges, artificial agents can potentially help to resolve conflicts, facilitate cooperation, and promote prosocial behaviours. This study aimed to develop a model to predict exchange decisions in an interactive social exchange context. The model was trained using secondary data from game-like experiments. As part of the model, data clustering was performed to group different behaviours into a finite number of strategies. The strategies employed by players were interpreted and used and used for predicting exchange decisions. The model performance increases over time, which suggested that a better model could be realised with an increase in the number of interactions over time represented by the training data. The study provides a novel method of integrating theory-driven data into machine learning models and means by which adaptive agents can be integrated during human-agent cooperation to predict human decisions and react to them. The ability of artificial agents to predict and react to human decision is vital to the ongoing endeavours in enhancing human-agent cooperation. Thus, this study also contributes to the existing literature on cooperation in interactive social exchanges. Lastly, the study suggests the evaluation of the model in real-time experiments to test how agents applying the model can perform in a real-time interactive social exchange.

Funding: This work is based on the research supported in part by the National Research Foundation (NRF) of South Africa (Grant Numbers: 111836). The opinions, findings and conclusions or recommendations expressed herein are that of the author(s) alone, and not that of the NRF.

Declaration of interest: None.

Informed consent: This study used secondary data. Thus, no human subject was used.

Data availability: The processed data is available via Open Science Foundation. However, the VIAPPL game data (unprocessed) will be made available on request to [email protected] or [email protected] .

Authors’ contributions: The model was designed, developed and experiments conducted by Kevin Igwe under the supervision of Prof. Kevin Durrheim. The paper was drafted by Kevin Igwe and jointly edited and refined by Kevin Igwe and Prof. Kevin Durrheim.

Fernández Domingos, E. et al. Delegation to artificial agents fosters prosocial behaviors in the collective risk dilemma. Scientific Reports 12, 1–12 (2022).
Suzuki, S. & O'Doherty, J. P. Breaking human social decision making into multiple components and then putting them together again. Cortex 127, 221–230 (2020).
Bardsley, N. Dictator game giving: altruism or artefact? Experimental Economics 11, 122–133 (2008).
Capraro, V., Jordan, J. J. & Rand, D. G. Heuristics guide the implementation of social preferences in one-shot Prisoner's Dilemma experiments. Scientific reports 4, 6790 (2014).
Larney, A., Rotella, A. & Barclay, P. Stake size effects in ultimatum game and dictator game offers: A meta-analysis. Organizational Behavior and Human Decision Processes 151, 61–72 (2019).
Durrheim, K., Quayle, M., Tredoux, C. G., Titlestad, K. & Tooke, L. Investigating the evolution of ingroup favoritism using a minimal group interaction paradigm: the effects of inter-and intragroup interdependence. PloS one 11, e0165974 (2016).
Nilsson, N. J. A production system for automatic deduction. (STANFORD UNIV CA DEPT OF COMPUTER SCIENCE, 1977).
Georgeff, M., Pell, B., Pollack, M., Tambe, M. & Wooldridge, M. in International Workshop on Agent Theories, Architectures, and Languages. 1–10 (Springer).
Morgan, J. H., Lebiere, C., Moody, J. & Orr, M. G. in Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2021. Lecture Notes in Computer Science Vol. 12720 (eds Robert Thomson, Muhammad Nihal Hussain, Christopher Dancy, & Aryn Pyke) 268–278 (Springer, Cham, 2021).
Enayat, T., Ardebili, M. M., Kivi, R. R., Amjadi, B. & Jamali, Y. A Computational Approach to Homans Social Exchange Theory. arXiv preprint arXiv:2007.14953 (2020).
Kavak, H., Padilla, J. J., Lynch, C. J. & Diallo, S. Y. in Proceedings of the Annual Simulation Symposium. 12 (Society for Computer Simulation International).
Phung, T., Winikoff, M. & Padgham, L. in International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. 282–288 (Springer).
Homans, G. C. Social behavior: Its elementary forms. (Harcourt, Brace & World., 1974).
Homans, G. C. Social behavior: Its elementary forms. (Harcourt, Brace & World, 1961).
Balliet, D. & Van Lange, P. A. Trust, conflict, and cooperation: a meta-analysis. Psychological bulletin 139, 1090 (2013).
Johnson, N. D. & Mislin, A. A. Trust games: A meta-analysis. Journal of economic psychology 32, 865–889 (2011).
Andión, J., Dueñas, J. C. & Cuadrado, F. in International Workshop on Soft Computing Models in Industrial and Environmental Applications. 609–619 (Springer).
Augustijn, E.-W., Kounadi, O., Kuznecova, T. & Zurita-Milla, R. Teaching Agent-Based Modelling and Machine Learning in an integrated way. GeoComputation 2019. (2019).
Augustijn, P., Abdulkareem, S. A., Sadiq, M. H. & Albabawat, A. A. in 2020 International Conference on Computer Science and Software Engineering (CSASE). 1–6 (IEEE).
Edmonds, B. & Moss, S. in International workshop on multi-agent systems and agent-based simulation. 130–144 (Springer).
Ellemers, N. in The psychology of legitimacy: Emerging perspectives on ideology, justice, and intergroup relations (ed John T Jost) 205–222 (Cambridge University Press, 2001).
Turner, J. C. & Tajfel, H. in Psychology of Intergroup Relation (eds Stephen Worchel & William G Austin) 7–24 (Hall Publishers, 1986).
Tajfel, H., Billig, M. G., Bundy, R. P. & Flament, C. Social categorization and intergroup behaviour. European journal of social psychology 1, 149–178 (1971).
Kisfalusi, D., Janky, B. & Takács, K. Double standards or social identity? The role of gender and ethnicity in ability perceptions in the classroom. The Journal of Early Adolescence 39, 745–780 (2019).
Tajfel, H. Social psychology of intergroup relations. Annual Review of Psychology 33, 1–39 (1982).
Balliet, D., Wu, J. & De Dreu, C. K. Ingroup favoritism in cooperation: a meta-analysis. Psychological bulletin 140, 1556 (2014).
Tajfel, H. & Turner, J. C. in Psychology of Intergroup Relations (eds William G Austin & Stephen Worchel) 56–65 (Nelson-Hall., 1979).
Yamagishi, T., Jin, N. & Kiyonari, T. Bounded generalized reciprocity: Ingroup boasting and ingroup favoritism. Advances in group processes 16, 161–197 (1999).
Yamagishi, T. & Mifune, N. Social exchange and solidarity: In-group love or out-group hate? Evolution and Human Behavior 30, 229–237 (2009).
Molm, L. D., Collett, J. L. & Schaefer, D. R. Building solidarity through generalized exchange: A theory of reciprocity. American journal of sociology 113, 205–242 (2007).
Molm, L. D., Melamed, D. & Whitham, M. M. Behavioral consequences of embeddedness: Effects of the underlying forms of exchange. Social Psychology Quarterly 76, 73–97 (2013).
Yoshikawa, K., Wu, C.-H. & Lee, H.-J. Generalized social exchange and its relevance to new era workplace relationships. Industrial and Organizational Psychology 11, 486–492 (2018).
De Dreu, C. K., Gross, J., Fariña, A. & Ma, Y. Group cooperation, carrying-capacity stress, and intergroup conflict. Trends in Cognitive Sciences 24, 760–776 (2020).
Zaki, J. F., Ali-Eldin, A., Hussein, S. E., Saraya, S. F. & Areed, F. F. Traffic congestion prediction based on Hidden Markov Models and contrast measure. Ain Shams Engineering Journal 11, 535–551 (2020).
Team R Core. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2019).
Gower, J. C. A comparison of some methods of cluster analysis. Biometrics, 623–637 (1967).
Gan, G., Ma, C. & Wu, J. Data clustering: theory, algorithms, and applications. (SIAM, 2020).
Akay, Ö. & Yüksel, G. Clustering the mixed panel dataset using Gower's distance and k-prototypes algorithms. Communications in Statistics-Simulation and Computation 47, 3031–3041 (2018).
Dinh, D.-T., Fujinami, T. & Huynh, V.-N. in International Symposium on Knowledge and Systems Sciences. 1–17 (Springer).
Günther, F. & Fritsch, S. Neuralnet: training of neural networks. R J. 2, 30 (2010).
Hopfield, J. J. Artificial neural networks. IEEE Circuits and Devices Magazine 4, 3–10 (1988).
Mehlig, B. Artificial neural networks. arXiv e-prints, arXiv: 1901.05639 (2019).
Eclipse Deeplearning4j Development Team. Deeplearning4j: Open-source distributed deep learning for the JVM,, <https://deeplearning4j.konduit.ai/> (2016).
Tharwat, A. Classification assessment methods. Applied Computing and Informatics (2020).
FeldmanHall, O., Dalgleish, T., Evans, D. & Mobbs, D. Empathic concern drives costly altruism. Neuroimage 105, 347–356 (2015).
Mor, B., Garhwal, S. & Kumar, A. A Systematic Review of Hidden Markov Models and Their Applications. Archives of computational methods in engineering 28 (2021).
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989).
Shreffler, J. & Huecker, M. R. in StatPearls [Internet] (StatPearls Publishing, 2022).
De Dreu, C. K., Fariña, A., Gross, J. & Romano, A. Prosociality as a foundation for intergroup conflict. Current opinion in psychology 44, 112–116 (2022).

Table 1. The exchange decisions as determined by the definite behaviours

Reciprocation (versus Non-reciprocation)	Giving rich (versus giving poor)	Out-group (versus in-group giving)	Exchange decision	Explanation of the exchange decision
0	0	0	000	Allocate a token to a poor in-group member
0	0	1	001	Allocate a token to a poor out-group member
0	1	0	010	Allocate a token to a rich in-group member
0	1	1	011	Allocate a token to a rich out-group member
1	0	0	100	Reciprocate to a poor in-group member
1	0	1	101	Reciprocate to a poor out-group member
1	1	0	110	Reciprocate to a rich in-group member
1	1	1	111	Reciprocate to a rich out-group member

Table 2. Demonstrates the calculations of in-group and out-group relationships. N_in = N_out = 4 representing the number of in-group and out-group members respectively.

Participant No	(In-degree from in-group)	In-group relationship	(In-degree from out-group)	Out-group relationship
1	1	1/4 = 0.25	1	1/4 = 0.25
2	0	0/4 = 0	0	0/4 = 0
3	1	1/4 = 0.25	0	0/4 = 0
4	0	0/4 = 0	0	0/4 = 0
5	0	04 = 0	1	1/4 = 0.25
6	1	1/4 = 0.25	0	0/4 = 0
7	0	0	0	0/4 = 0
8	1	1/4 = 0.25	2	2/4 = 0.5

Table 3. Features generated from observable differences between individuals and motives in social interaction

Features	Type of Variables	Range/possible values
Group	Categorical (Nominal)	1 and 2
Status	Real	[0, 1]
Wealth Aspiration	Real	[0, 1]
In-group relationship	Real	[0, 1]
Out-group relationship	Real	[0, 1]
Reciprocity	Categorical (Nominal)	0 and 1
Previous allocation	Categorical (Nominal)	0,1 and 2

Table 4. Exchange data motives after pre-processing. Each row represents the participants’ data in a particular round, showing (1) information used for tracking purposes (participant number, game and round),(2) features (that form input to the cluster) and (3) the exchange decision (made by each participant in each round).

				Features
Number of exchange data	Participant No.	Game	Round	Group	Status	Wealth Aspiration	In-group relationship	Out-group relationship	Reciprocity	Previous allocation	Exchange decision
1	1	ENG1	1	1	0.75	0.75	0	1	1	2	001
2	2	ENG1	1	2	0.87	0.87	0.25	0	0	1	010
3	3	ENG1	1	1	1	0.75	0	0	0	0	110

7644	8	ING5	40	2	0.92	0.87	0.5	0.25	0	1	000

Table 5. The interpretation of clusters as strategies using the combination of the two dominant exchange decisions.

Cluster	The Two Dominant Exchange Decisions made for the cluster	Interpretation of Cluster as a Strategy	Given Name
1	000, 010	Allocate to in-group members irrespective of their status.	Ingroup-Caring
2	010, 110	Allocate and reciprocate to a rich in-group member.	Ingroup-Power-Seeking
3	001, 010	Allocate to a poor out-group member or a rich in-group member.	Outgroup-Reputation
4	000, 011	Allocate to a poor in-group member or a rich out-group member.	Outgroup-Power Seeking
5	000, 110	Allocate to a poor in-group member or reciprocate to a rich in-group member.	Equality
6	000, 100	Allocate or reciprocate to a poor in-group member.	Ingroup-Promotion

Table 6. The confusion matrix of the evaluation on the test set when the number of epochs is set to 5 (all other parameters remain the same as reported). 104 data points belonging to cluster 6 were misclassified, 28 were misclassified as belonging to strategy 4, and 76 were misclassified as belonging to strategy 5.

	Actual Clusters
Predicted Clusters		Cluster 1	Cluster 2	Cluster 3	Cluster 4	Cluster 5	Cluster 6
	Cluster 1	497	0	0	0	0	0
	Cluster 2	0	157	0	0	0	0
	Cluster 3	0	0	300	0	0	0
	Cluster 4	0	0	0	287	0	28
	Cluster 5	0	0	0	0	524	76
	Cluster 6	0	0	0	0	0	39

Table 7. The performance of the artificial neural network on the test set with the number of epochs set to 5. The accuracy is 95% while the precision is 96%.

Evaluation Measures	Actual Output	Percentage
Accuracy	0.9455	95%
Precision	0.9641	96%
Recall	0.8788	88%
F1 Score	0.8857	89%

Table 8. The round-forward chaining shows the proportion of the data used for training and testing at each round.

Training Set	Test Set
An individual’s strategy and exchange decisions in rounds 1 to 3	The individual’s exchange decision in round 4
An individual’s strategy and exchange decisions in rounds 1 to 4	The individual’s exchange decision in round 5
…	…
An individual’s strategy and exchange decisions in rounds 1 to 39	The individual’s exchange decision round 40

Table 9. Evaluation of the HMM. The data is formulated for demonstration.

Participant ID	Observation in round r	HMM observation prediction for round r	HMM accuracy for the exchange decision	Accuracy score for each definite behaviour
Participant ID	Observation in round r	HMM observation prediction for round r	HMM accuracy for the exchange decision	Reciprocity score	Fairness score	Favouritism score
1	001	100	0	0	1	0
2	011	011	1	1	1	1
3	000	000	1	1	1	1
4	011	001	0	1	0	1
5	100	101	0	1	1	0
Average			2/5 = 40%	4/5 = 80%	4/5 = 80%	3/5 = 60%

No competing interests reported.

SupplementaryInformation.pdf

Predicting Human Exchange Decision-Making with Theoretically Informed Data and Machine Learning

Status:

Version 1

Abstract

Figures

Introduction

Method

Result

Discussion

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1