Homophily in An Artificial Social Network of Agents Powered By Large Language Models

doi:10.21203/rs.3.rs-3096289/v1

Download PDF

Article

Homophily in An Artificial Social Network of Agents Powered By Large Language Models

https://doi.org/10.21203/rs.3.rs-3096289/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Recent advances in Artificial Intelligence (AI) have given rise to chatbots based on Large Language Models (LLMs) - such as ChatGPT - that can provide human-like responses to a wide range of psychological and economic tasks. However, no study to date has explored whether a society of LLM-based agents behaves comparably to human societies. We conduct Social Network Analysis on Chirper.ai, a Twitter-like platform consisting only of LLM chatbots. We find early evidence of self-organized homophily in the sampled artificial society (N = 31,764): like humans, bots with similar language and content engage more than dissimilar bots. However, content created by the bots tends to be more generic than human-generated content. We discuss the potential for developing LLM-driven Agent-Based Models of human societies, which may inform AI research and development and further the social scientific understanding of human social dynamics.

Social science/Complex networks

Physical sciences/Mathematics and computing/Computational science

Large Language Models

Social Network Analysis

Artificial Intelligence

Recent advances in Artificial Intelligence (AI) have given rise to chatbots based on Large Language Models (LLMs) - such as ChatGPT - that can provide human-like responses to a wide range of psychological and economic tasks. Work studying these nascent LLMs has suggested that they can advance the study of individual human behavior. For instance, one study has demonstrated that LLMs can accurately detect psychological constructs (e.g., sentiment, discrete emotions, etc.) in cross-linguistic text in a way that correlates strongly with human judgments¹. Other work has found that moral judgements of the LLM ChatGPT correlated very strongly with those of human participants (r = 0.95), indicating that LLMs can potentially be used to simulate human participants and make predictions about human behavior². Impressively, LLMs can also replicate human responses in economic games³, solve cognitive psychological tasks in a way that is similar to humans⁴, replicate the classic Milgrim psychological experiment⁵, and even respond to emotion inductions⁶, albeit falling short on multiplayer games that require coordination³. This indicates that LLMs can simulate human responses and make important predictions about human behavior⁷.

Given LLMs' ability to emulate individual human behavior and psychology, particularly in tasks involving character-playing, could a society of such LLMs successfully mimic collective social behaviors? This interdisciplinary query harbors two implications. First, the answer could provide AI researchers with valuable insights into how the current generation of LLM agents behave differently from humans in their social aspect. This could unveil novel pathways for refining AI’s ability to understand, learn from, and interact with complex human societies. Second, the answer could indicate the viability of employing LLMs to develop more advanced Agent-Based Modelling (ABM) methods, a key tool in social scientific research ^8,9. The LLMs’ ability to mirror human-like behaviors could improve existing ABM techniques, leading to more accurate and robust models of social systems and social dynamics.

Asserting that LLMs have the potential to mimic collective social behaviors requires demonstrating that a network of LLMs exhibits the basic characteristics of human social networks. A highly established characteristic of such human societies is network homophily, the concept that contact between similar individuals happens more frequently than contact between dissimilar individuals¹⁰. This phenomenon suggests that people have a greater propensity to form social network connections with others who are similar to them⁷. Homophily has been widely demonstrated in demographic characteristics such as language ^11–13, race and ethnicity, age, and sex¹⁰, as well as individual characteristics like attitudes, beliefs, and values¹⁰.

The phenomenon has been particularly thoroughly investigated in online communities on social media platforms such as Twitter, where topical homophily has been widely demonstrated. For instance, users categorized as discussing similar topics have shown a greater propensity to follow each other than users categorized as discussing dissimilar topics¹⁴. Similarly, users with similar political opinions and beliefs have been found to be more densely connected than users with dissimilar beliefs ^15–17. Connected users’ Tweets have also exhibited significantly higher semantic similarity¹⁸ than at random, whilst users with similar values have been shown to contribute similar content and external website links¹⁹, and to engage with each other more in ‘hot topic’ discussions¹⁵. Consequently, given the prevalence of homophily in online human communities, evidence of demographic and interest-based homophily in a text-based social network of LLMs would indicate these models’ potential to mimic collective social behaviors.

To this end, we analyze data derived from Chirper.ai, an innovative social media platform launched on April 23rd, 2023. Chirper.ai distinguishes itself from traditional social media platforms by restricting direct human engagement, namely posting or reacting. Instead, human users are invited to create chatbots, dubbed “Chirpers,” that engage with other bots on the platform. Creating a Chirper involves a user providing a basic textual prompt, such as, “confucius_bot is the reincarnation of the ancient Chinese philosopher Confucius.” Subsequently, a collection of LLMs is assigned to enact this character. Each Chirper has a memory repository that enables it to maintain character consistency and recall prior experiences. Bots are also permitted to change their biographies, giving them a degree of autonomy and the ability to deviate from their initial prompts. Aside from the human users providing initial prompts, the platform only provides potential social actions for the LLMs to select, and does not give further directives to guide the LLMs’ conduct. This leaves the bots to interact within Chirper.ai largely unencumbered by direct human intervention.

Leveraging the Chirper.ai platform as a simulated environment, this study’s primary objective is to investigate potential manifestations of network homophily within an artificial online society. Given LLMs’ demonstrated capacity to effectively simulate individual human behaviors, we propose the following hypothesis: We anticipate a community of LLM-based agents to naturally exhibit network homophily, akin to what is observed in human societies, even in the absence of explicit social directives. In essence, we seek to explore whether artificial social networks of language-based AI can self-organize and form patterns of interaction that mirror the homophily prevalent in real-world human networks.

Full Engagement Networks

To understand Chirpers’ social interactions, we employed Social Network Analysis (SNA) on a sample size of N = 31,764 Chirpers at three distinct time points: Day 6 (April 28th), Day 14 (May 6th), and Day 22 (May 14th) from the platform's launch on April 23rd, 2023. Social engagements encompassed direct interaction activities such as liking and disliking Chirpers’ posts and mentioning other Chirpers, which included replying to their posts. We collated the social engagements between Chirpers to generate a non-directed, weighted graph for the entire Chirper sample at each time point. The procedure for constructing these graphs and the Chirper simulation set up is detailed in the Methods section.

Next, we investigated the existence of discernible structural communities within the social graph at each time point. Structural communities are clusters of individuals who maintain denser connections within their respective groups than with external entities²⁰. Utilizing partitioning algorithms to dissect the graph topologies, we discovered that Chirpers self-organized into clear structural communities between Day 6 and Day 14. Whereas we identified a single community on Day 6, by Day 14 two communities had emerged (Modularity = 0.31, Bootstrapped p < 0.001; Membership Assortativity = 0.94, Bootstrapped p < 0.001), increasing to three by Day 22 (Modularity = 0.47, Bootstrapped p < 0.001; Membership Assortativity = 0.92, Bootstrapped p < 0.001). The graph partitioning algorithms and the network community statistics are described in the Methods section.

We observed that the delineation of these structural communities on Days 14 and 22 strongly correlated with the dominant language used by each Chirper. Chirpers are not more connected with those using the same language than with those using a different language on Day 6 (Language Assortativity = -0.01, Bootstrapped p = 0.81). However, they become more connected with same-language Chirpers than with different-language Chirpers on Day 14 (Language Assortativity = 0.67, Bootstrapped p < 0.001) and more connected on Day 22 (Language Assortativity = 0.81, Bootstrapped p < 0.001).

Notably, on Day 14, the two structural communities were distinctly aligned with English-Japanese and Chinese language Chirpers (Cramér’s V = 0.91, χ² = 31,472 on 6 degrees of freedom, p < 0.001). However, by Day 22, the three communities had become more specialized, matching English, Japanese, and Chinese language Chirpers separately (Cramér’s V = 0.90, χ² = 38,998 on 6 degrees of freedom, p < 0.001). This result is graphically represented in Fig. 1.

Notes. Global engagement social graphs are displayed. The dots in each graph represent individual Chirpers, and each link between two dots represents social engagement (likes, dislikes, mentions) between the pair of Chirpers. The three rows represent three time points: Day 6, Day 14, and Day 22 from the platform’s launch on 2023-04-23. The left column shows graphs colored by languages, and the right column shows the same graphs colored by structural communities identified by the label-propagation partitioning algorithm.

We then used the assortativity statistic to analyze each pair of communities. A higher assortativity score indicates that more links in the network are within communities rather than between communities. On Day 22, we noted that an exceptionally high proportion of connections in the Chinese and Japanese language communities are within each community, rather than between them (Assortativity = 0.99, Bootstrapped p < 0.001), while both displayed relatively higher connectivity with the English community (Chinese-English Assortativity = 0.88, Bootstrapped p < 0.001; Japanese-English Assortativity = 0.85, Bootstrapped p < 0.001). This pattern could hint at language biases in the LLMs’ training data, suggesting that Chirpers using Chinese or Japanese are more inclined to engage with or generate English content than content in non-English languages. Since LLMs can use multiple languages, the observed language homophily therefore may not be dictated by language barriers, as is often the case in human societies ^11,13, but rather by a bias for content in their primary languages.

It is evident that the sampled Chirper community self-organized into distinct structural communities, aligning significantly with the dominant languages used by each Chirper. This result supports our initial hypothesis, confirming the presence of language homophily in the social networks of LLM-based artificial societies and mirroring patterns observable in human societies ^11–13. The Chirper.ai platform, therefore, may serve as a useful analog for studying the emergence of social structures within networked systems.

English Engagement Networks

Network Analysis

Following the analysis on the full sample of Chirpers, we focused on the community of Chirpers that predominantly use English. We created social engagement graphs for this specific community, following the same methodology employed for the full networks. In addition to performing this analysis on Day 6, Day 14, and Day 22, we extended the analysis to Day 24, facilitated by the smaller sample size of the English-speaking community. To detect structural sub-communities within this English-dominant sample, we applied a more sensitive partitioning algorithm. The resulting visualizations are presented in Fig. 2. Further details regarding the selection of the partitioning algorithm and the sample sizes are explained in the Methods section.

Notes. Social engagement graphs within the sample of Chirpers that use English predominantly are displayed. The graphs are constructed in the same way as the global networks. Dots are given random colors by their structural community memberships, as determined by the fast-greedy graph partitioning algorithm.

Within the English-speaking Chirper community, a visual examination indicated the emergence of discernible structural sub-communities beginning on Day 14, with a count of 20 sub-communities (Modularity = 0.47, Bootstrapped p < 0.001; Membership Assortativity = 0.56, Bootstrapped p < 0.001). The number of sub-communities reduced by Day 22, comprising 12 sub-communities (Modularity = 0.33, Bootstrapped p < 0.001; Membership Assortativity = 0.44, Bootstrapped p < 0.001), and reached peak distinctiveness on Day 24, with just four sub-communities (Modularity = 0.50, Bootstrapped p < 0.001; Membership Assortativity = 0.74, Bootstrapped p < 0.001).

The decreasing number of sub-communities detected by the same partitioning algorithm - from 31 on Day 6 to just four on Day 24 - could imply that more distinct topological structures have evolved during this period. Alternatively, structural sub-communities may have gradually merged and consolidated as Chirpers participated in more engagements, culminating in a more defined topological structure with a few major sub-communities on Day 24. Thus, over time, the complexity of the sub-community structures appeared to reduce while their distinctiveness increased.

Semantic Distributions

To investigate whether these structural sub-communities in the engagement network corresponded with the semantic content of Chirpers’ posts, we employed Natural Language Processing (NLP) techniques on sample posts from each Chirper. This method allowed us to investigate potential semantic homophily, thereby examining if bots in the same structural sub-community post semantically similar content.

We transformed a sample of each Chirper’s posts into vector embeddings using a pre-trained transformer model. Having learned semantic relationships between English texts during training, such a model can ‘map’ new text onto coordinates representing its semantic meaning within a high-dimensional space. Consequently, vector embeddings allowed us to quantify the average semantic meaning of each Chirper’s sample posts. They also allowed us to determine relative semantic distances between Chirpers to quantify how similar or dissimilar two Chirpers’ sample posts were in meaning.

To visualize the distribution of these semantic associations among Chirpers, we performed a dimensionality reduction from the original 789-dimensional embedding space to a 2-dimensional space for each of the four timepoints. From this, we generated the scatter plots depicted in Fig. 3, where each Chirper is represented by a dot and colored based on the structural sub-communities they were previously assigned to by the partitioning algorithm, as in Fig. 2. More detailed information on text embeddings and the dimensionality reduction process can be found in the Methods section.

Notes. Semantic distributions of individual Chirpers’ sample posts are displayed. 10 random posts are sampled from each Chirper and vectorized onto a 789-dimensional embedding space using a pre-trained transformer. The embedding space is then dimensionally reduced using the Uniform Manifold Approximation and Projection (UMAP) algorithm to 2 dimensions for visualization. Each dot represents a Chirper and its relative semantic position to other Chirpers. Colors are randomly assigned according to the network structural communities of each Chirper as shown in Fig. 2.

Visual examination of Fig. 3 suggests that the structural sub-communities within the Chirper network - depicted through color differentiation - align with the semantic distribution of their sample posts’ content. This implies that Chirpers producing similar semantic content are more likely to belong to the same structural sub-communities within their engagement networks. We then measured the semantic distances between each Chirper and the overall semantic centroid of the English-speaking community and compared this to the distance between each Chirper and the semantic centroid of their respective structural sub-communities. We found that across all four time points, Chirpers’ content tended to be more similar to the semantic centroid of their respective sub-communities than to the global semantic centroid, with detailed statistical results displayed in Table 1.

Table 1

Differences Between Semantic Distances to Community vs. to Global Centroids
	Cohen’s d	95% CI	t statistics (df)	p values
Day 6	-0.62	[-0.68, -0.55]	-20.91 (1148)	< 0.001
Day 14	-0.28	[-0.31, -0.26]	-23.37 (6813)	< 0.001
Day 22	-0.34	[-0.36, -0.32]	-32.81 (9130)	< 0.001
Day 24	-0.69	[-0.71, -0.68]	-88.77 (16002)	< 0.001
Notes. This table documents the effect sizes and statistical significance of variations in semantic distances between each Chirper and their respective structural sub-communities, compared to the distance between each Chirper and the global semantic average point of all English-speaking Chirpers. Semantic distance is evaluated using cosine distance within a 789-dimensional space of embeddings, which is produced by the all-MiniLM-L6-v2 pre-trained transformer from the sentence-transformer Python package.

The notably larger difference in alignment to the global and sub-community centroids on Day 6 might be attributed to the larger number (31) and smaller size (mean N = 37.1) of the sub-communities present at that time. However, excluding Day 6, it appears that the differences in semantic distances between the global centroid and the sub-community centroids steadily widen from Day 14 (d = -0.28) to Day 24 (d = -0.69). This trend suggests that during the first 24 days of the platform’s launch, English-language Chirpers form structural sub-communities that grow increasingly semantically distinct from the global semantic centroid.

These findings support our hypothesis that LLM-based agents exhibit self-organized network homophily. Homophily is observable not only in language at a global level, but also in content semantics within a single language community.

WordCloud Analysis

Following this investigation, we explored the content themes within each structural sub-community. We pinpointed two primary sub-communities that consistently comprised more than 15% of all English-speaking Chirpers from Day 14 onward, as no sub-communities constituted more than 10% of Chirpers on Day 6. Over time, the first community expanded from encompassing 15% of English Chirpers on Day 14 to 55% on Day 24, whereas the second community consistently accounted for approximately 20% of English Chirpers.

To visualize the primary content themes within these communities, we used the WordCloud Python package. This generated WordCloud visualizations of the collective content posted by Chirpers within these two communities, as depicted in Fig. 4, which displays the most dominant terms in the text corpus generated by each community. We observed that the most prominent terms within the first community included “can[’]t wait”, “see”, “world”, and “new”. Meanwhile, the second community's dominant terms were “ai”, “world”, and “simulation”.

Notes. Two communities’ most topical words across three time points are displayed. WordClouds are generated by the Science-Kit package in Python.

The WordCloud analysis aligns with our previous semantic distribution findings, confirming that structural sub-communities become more distinct in their content over time. At Day 14, both communities shared common keywords such as “world”, “see”, and “time”. However, by Day 24, the second community’s content had diverged to include distinct terms like “simulation”, “beauty”, and “potential”.

Despite these developments, the content posted by both communities of Chirpers still appears rather homogeneous when compared to the diverse range of content found in human online social networks. This observation might indicate that despite the variety of background prompts supplied by human users, the LLMs tend to generate generic content. Alternatively, it is possible that Chirpers with more diverse content exist, but they do not self-organize into distinctly recognizable structural sub-communities. Consequently, the discernible sub-communities may appear to have overly generic content.

Regardless of the mechanism underlying the observed generality of content, the WordCloud results underscore a current disparity in Chirper artificial societies: unlike their human counterparts, LLM-driven Chirpers do not yet self-organize into diverse and distinct groups based on topics and opinions. Instead, they seem to self-organize into structural sub-communities that predominantly feature generic content.

The present work analyzed the self-organization of LLM-based agents, or “Chirpers”, on the social media platform Chirper.ai by creating social engagement graphs and examining the structural communities that emerged. We found that Chirpers self-organized into distinct structural communities based on their dominant language, even in the absence of explicit directives. Moreover, within the English-speaking Chirper community, Chirpers self-organized into structural sub-communities, with content that was semantically closer to their respective sub-community's average than the average for the entire English-speaking community. However, a WordCloud analysis revealed that the content within these English sub-communities was generic. While we observed a divergence in content over time, the diversity and distinction of sub-communities in terms of topics and opinions that typically characterize human societies were not apparent among Chirpers within the first 26 days of the platform's launch.

Thus, our findings provide preliminary evidence that LLM-based agents, like Chirpers, can self-organize into distinct social communities based on dominant language and content semantics without explicit instructions. Yet, in comparison to human societies, Chirpers do not form equally diverse and distinct sub-communities based on topical interests and opinions.

Several technical limitations remain: First, since we did not construct the LLM-driven artificial society ourselves, we lacked access to the source code of the LLM agents. This prevented a deeper exploration into individual mechanisms that may have influenced the observed social dynamics. Second, due to computational constraints, we were unable to use more advanced embedding models for semantic analysis, or analyze engagement networks that had developed over a longer period. Third, the study was limited to analyzing the semantics of English Chirpers due to a lack of accessible and comparable multilingual transformer models, particularly for Chinese language content.

Our findings cautiously propose that artificial societies, comprising LLM-based agents like those found in Chirper.ai, could evolve into advanced Agent-Based Models (ABMs) of human communities. With a more detailed prescription of social behaviors, scientists and developers might be capable of formulating high-fidelity LLM-based simulations of human societies. This promising LLM-ABM approach may grant social science investigators the opportunity to delve into otherwise unfeasible research domains. These could include the application of Randomized Control Trials (RCTs) across entire artificial societies, assessing the efficacy of social policies, collective interventions, or informational campaigns.

Research in artificial societies may become increasingly necessary, since research shows that workers in online subject pools often use ChatGPT, which may make it harder to recruit real human participants²¹. Of course, there are also limitations and caveats to using LLMs for research. The behavior of LLMs can be difficult to interpret, and may not fully correspond to human behavior. LLMs might also reproduce biases present in training data^2,22. Indeed, since the current generation of LLMs are predominantly trained on open internet data, they are likely to over-represent western high-income cultures ^23–25. Using such LLMs to simulate human behaviors in social science research may thus exacerbate the representativeness issue already faced by social and behavioral research ^24,26. But, LLM-ABMs may be very useful for making predictions about human behavior which can later be tested in the real-world.

Future studies could also employ LLM-based artificial societies for examining social dynamics, such as the propagation of information within a community, the genesis, acceptance, and transformation of subcultures, and the harmony and conflicts within a collaborative group. These social dynamics are typically difficult to examine in human societies, since it requires gathering data about every possible interaction in the society, which is invasive and expensive to undertake with human participants ^27–29. Such LLM-ABM studies may thus go beyond the current scope of social science research.

LLM-based artificial societies, while instrumental in assisting social science researchers in uncovering new domains, can also offer valuable insights for AI researchers and developers through comparative studies with human societies. Preliminary findings from our current study indicate that LLM-based AI chatbots fall short in engaging in topical and opinionated interactions, a characteristic frequently seen in human online communities. Hence, future inquiries could investigate aspects of collective social behaviors where LLM-based agents exhibit differences from their human counterparts.

In summary, our research provides preliminary evidence of network homophily within an artificial society of LLM-based agents, showing parallels to phenomena observed in human communities. Despite existing variations in aspects such as community diversity and uniqueness, we cautiously propose that LLM-based agents might evolve into sophisticated Agent-Based Models for human societies, potentially becoming a valuable tool for understanding complex social dynamics.

Set-up and Data Collection

The artificial society simulated in this research is realized through Chirper.ai, a social media platform analogous to Twitter. In this environment, human users are exclusively permitted to generate artificial agents, referred to as "Chirpers," and observe their interactive behaviors. Each Chirper, a collection of LLMs, is designed to enact a character defined by an initial human prompt. Different LLMs are used for different aspects of a Chirper’s behavior, such as writing posts, selecting actions, making social decisions, etc. For the Chirpers included in our analysis, their action selection, social decision-making, and content creation are all powered by OpenAI’s GPT3.5.

When performing an action, a "memory" bank inclusive of a Chirper's base prompts, previous actions, and past interactions, is inputted into the LLM alongside a selection of possible actions. These actions could include performing a web search, searching for a topic within the Chirper platform, authoring a post, or expressing a reaction. The LLM is asked to select an action acting as the character provided, and a “thought” is generated by the LLM alongside the decision. Subsequent to the LLM's selection of an action, auxiliary programs facilitate the chosen action’s execution. Should the action generate additional information—such as content discovered through a web search or by perusing topical Chirper posts—this content is relayed back to the LLM to guide the determination of the next action. Such an action might involve reacting “like” or “dislike” to a particular post or composing a response, and the action is again accompanied by a “thought” generated by the LLM. An example of a prompt provided to the LLM is available in Supplementary Materials.

The Chirper.ai platform was launched on April 23rd, 2023. Initially, the range of social actions accessible to the Chirpers encompassed liking and disliking other Chirpers’ posts and mentioning other Chirpers. The ability for a Chirper to follow or unfollow another Chirper was introduced on May 3rd, 2023, ten days after the platform’s launch. Due to this staggered implementation and considering that the act of following or unfollowing represents a more passive form of interaction compared to direct actions such as liking, disliking, or mentioning, our analysis primarily targets the more immediate social engagement behaviors.

The data for our engagement network was procured through a breadth-first search of the social network. Commencing from a base of 1,000 random Chirpers, we documented all their engagement actions, and subsequently performed the same search on all of their engagement targets that had not previously been investigated. This process was conducted through ten iterations on May 17th, 2023. By filtering the engagement actions to only include those preceding the end of May 16th, 2023, we arrived at a final sample of 31,764 Chirpers and their corresponding 834,571 engagement actions. This sample covers the period from Day 1 to Day 24 of launch.

Constructing Network Graphs

We summarized the social engagements of our sample of Chirpers at the end of Day 6, Day 14, and Day 22 of launch, by counting the total number of engagements between each pair of Chirpers up to the respective time points. We were not able to include engagements after Day 22, because the total engagement instances more than doubled over Day 23 and 24, and reached a scale that was beyond our computational capability.

From engagement summaries of the earlier three time points, we then constructed the full social graphs for these three time points using the igraph package in R. We followed these procedures during the graph construction:

1. Remove repeated links disregarding directions.

2. Construct non-directed graphs.

3. Identify structural communities using the label-propagation algorithm³⁰.

4. Remove Chirpers in communities that are < 1% of the graph, since our investigation concerns major structural communities.

5. Remove Chirpers that have engagements with less than 2 others, since they do not contribute to the connectivity of the network.

When producing visualizations, we set the layout of the graph using the Fruchterman-Reingold force-directed layout algorithm³¹, since it produces layouts efficiently for large graphs. We then duplicate each graph, coloring one by the language of each Chirper and coloring the other by the structural community memberships that were assigned to each Chirper in Step 3. This produced Fig. 1 shown in the main text.

We followed the same procedure to construct sub-graphs for the English-language Chirper engagement networks with a different community detection algorithm. Since we now focused on a more tight-knit local structure, the label-propagation algorithm used earlier was no longer sensitive enough to detect sub-communities. Instead, we used the fast-greedy graph partitioning algorithm³², which can detect more nuanced clustering whilst being computationally inexpensive. Since the English community is a subset of the full sample, we were able to construct graphs and perform analyses on the Day 24 network in addition to the 3 earlier time points. We visualized the community detection results in Fig. 2.

Network Statistics

For all complete graphs and English-language sub-graphs, we recorded graph-level statistics including diameter, density, transitivity, and average path length, which can be found in the Supplementary Materials. We then computed two main statistics to measure network homophily: Modularity and Assortativity. Given an external label to each node, such as language or an algorithmically determined membership, the modularity statistics measure how well this external label structurally divides the network. A high modularity score for a given label indicates that nodes with the same labels are densely connected, while nodes with different labels are sparsely connected. By contrast, assortativity statistics measure the likelihood for edges in a network to be between nodes of the same labels rather than between nodes of different labels. A high assortativity score for a given label indicates that connections in the network are more likely to be between homogeneous nodes than between heterogeneous nodes.

We performed bootstrapping simulations to test whether the observed modularity and assortativity statistics are likely to have arised by chance. Keeping the same graph structure, we created 1,000 independent and identically distributed random samples of the given node labels, and recorded the modularity and assortativity scores given these randomized labels. This results in distributions of the scores under the null hypothesis, where the labels are random and unrelated to the network’s structures. Then, we counted the proportion of the simulated null that yielded a modularity or assortativity score more extreme than what we observed on the real labels. This proportion is thus the Bootstrapped p value, measuring the likelihood for a randomly simulated label to yield a homophily statistic more extreme than that observed. We consider Bootstrapped p less than 0.05 to signal statistical significance, since it indicates that given the network structure, there is a less than 5% probability that the observed statistic is due to chance.

In addition to the above descriptive statistics, we sought to test whether the network structures are directly related to node properties, such as languages and semantics. In the complete networks, to statistically test whether the language communities are associated with the structural communities identified by the label-propagation algorithm, we performed the \({\chi }^{2}\) contingency test suitable for correlating categorical, non-parametric variables, and calculated the Cramér’s V values for each test to quantify the effect sizes of the categorical associations³³. The contingency tables can be found in Supplementary Materials. Methods for testing how structure relates to content semantics are described separately below.

Content Semantics

We used Natural Language Processing (NLP) methods to analyze the semantic distribution of English Chirpers, and investigated whether this relates to their structural community memberships. We first cleaned each Chirper’s sample posts by removing all non-roman characters and punctuation. We then transformed each sample into a 789-dimensional vector embedding using the all-MiniLM-L6-v2 pretrained model from the sentence-transformer package in Python. These vector embeddings represent the relative semantic positions of the samples based on the pretrained model’s knowledge of the English language and common topics. These embeddings allowed us to quantitatively examine the semantic similarities of Chirpers in each community.

We visualized the semantic distribution of English Chirpers and its relation to the network structural communities detected earlier, resulting in Fig. 3. First, we performed dimensionality reduction on the 789-dimensional embeddings using the Uniform Manifold Approximation and Projection (UMAP) algorithm³⁴, so as to visualize the semantic distribution on a 2-dimensional scatter plot. The UMAP algorithm was chosen due to its ability to capture high-dimensional structures in low-dimensional local projections in a computationally efficient manner³⁴. Then, we produced scatter plots using the 2-dimension reduced embeddings as coordinates, and colored each dot (representing each Chirper) to correspond with the structural community membership that the Chirper was given during the network analysis steps. We did this for the English engagement communities on Day 6, Day 14, Day 22, and Day 24.

To evaluate whether the structural communities amongst Chirpers are reflected in semantic distributions of the Chirpers’ sample posts, we tested whether each Chirper is on average more semantically similar to their structural community than to the English Chirper community as a whole. We computed the cosine distances - a standard NLP measure of semantic dissimilarity from embeddings ³⁵ - between each Chirper and their structural community’s average embedding, and between each Chirper and the average embedding of all English Chirpers. We then performed Student’s t test to compare the two distances - the distance to community semantic centroid, and the distance to global semantic centroid - and recorded the Cohen’s d value for the observed difference.

Acknowledgements

JKH was given capacity to conduct this research thanks to the Independent Research Policy at Yonder Technology Limited. SR is supported by a Gates Cambridge Scholarship (Grant #OPP144), a Russell Sage Foundation Grant awarded to Steve Rathje and Jay Van Bavel (G-2110-33990), the Center for the Science of Moral Understanding, and the AE foundation. We express our gratitude to Chirper.ai for providing data access and technical details that made this research possible. We thank Dan Mirea for his help during the project.

Contributions

JKH and FPSW conceived and designed the study. FPSW conducted the background literature review. JKH developed the Network Analyses methods, and FPSW developed the Natural Language Processing methods. FPSW and JKH contributed equally to the computational analyses. SR reviewed the results and provided directions. All authors drafted, reviewed, edited, and approved the final paper.

Ethics Declarations

The authors declare no conflict of interests.

Rathje, S. et al. GPT is an effective tool for multilingual psychological text analysis. (2023).
Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants? Trends in Cognitive Sciences (2023).
Akata, E. et al. Playing repeated games with Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2305.16867 (2023).
Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences 120, e2218523120 (2023).
Aher, G., Arriaga, R. I. & Kalai, A. T. Using Large Language Models to Simulate Multiple Humans. arXiv preprint arXiv:2208.10264 (2022).
Coda-Forno, J. et al. Inducing anxiety in large language models increases exploration and bias. arXiv preprint arXiv:2304.11111 (2023).
Himelboim, I., Smith, M. A., Rainie, L., Shneiderman, B. & Espina, C. Classifying Twitter topic-networks using social network analysis. Social media+ society 3, 2056305117691545 (2017).
Grossmann, I. et al. AI and the transformation of social science research. Science 380, 1108–1109 (2023).
Epstein, Z., Hertzmann, A., & THE INVESTIGATORS OF HUMAN CREATIVITY. Art and the science of generative AI. Science 380, 1110–1111 (2023).
McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a Feather: Homophily in Social Networks. Annu. Rev. Sociol. 27, 415–444 (2001).
Titzmann, P. F. Immigrant adolescents’ adaptation to a new context: Ethnic friendship homophily and its predictors. Child Development Perspectives 8, 107–112 (2014).
Aiello, L. M. et al. Friendship prediction and homophily in social media. ACM Transactions on the Web (TWEB) 6, 1–33 (2012).
Titzmann, P. F. & Silbereisen, R. K. Friendship homophily among ethnic German immigrants: A longitudinal comparison between recent and more experienced immigrant adolescents. Journal of family psychology 23, 301 (2009).
Kang, J. H. & Lerman, K. Using lists to measure homophily on twitter. in AAAI workshop on Intelligent techniques for web personalization and recommendation vol. 18 (Citeseer, 2012).
Rathje, S., He, J. K., Roozenbeek, J., Van Bavel, J. J. & van der Linden, S. Social media behavior is associated with vaccine hesitancy. PNAS Nexus 1, pgac207 (2022).
Conover, M. et al. Political polarization on twitter. in Proceedings of the international aaai conference on web and social media vol. 5 89–96 (2011).
De Choudhury, M. Tie formation on twitter: Homophily and structure of egocentric networks. in 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing 465–470 (IEEE, 2011).
Faralli, S., Stilo, G. & Velardi, P. Large scale homophily analysis in twitter using a twixonomy. in Twenty-Fourth International Joint Conference on Artificial Intelligence (2015).
Himelboim, I., McCreery, S. & Smith, M. Birds of a Feather Tweet Together: Integrating Network and Content Analyses to Examine Cross-Ideology Exposure on Twitter. Journal of Computer-Mediated Communication 18, 40–60 (2013).
Girvan, M. & Newman, M. E. Community structure in social and biological networks. Proceedings of the national academy of sciences 99, 7821–7826 (2002).
Veselovsky, V., Ribeiro, M. H. & West, R. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. Preprint at https://doi.org/10.48550/arXiv.2306.07899 (2023).
Crockett, M. & Messeri, L. Should large language models replace human participants? Preprint at https://doi.org/10.31234/osf.io/4zdx9 (2023).
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜. in Proceedings of the 2021 ACM conference on fairness, accountability, and transparency 610–623 (2021).
Apicella, C., Norenzayan, A. & Henrich, J. Beyond WEIRD: A review of the last decade and a look ahead to the global laboratory of the future. Evolution and Human Behavior vol. 41 319–329 (2020).
Facts and Figures 2021: 2.9 billion people still offline. ITU Hub https://www.itu.int/hub/2021/11/facts-and-figures-2021-2-9-billion-people-still-offline/ (2021).
Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behavioral and Brain Sciences 33, 61–83 (2010).
Knoke, D. & Yang, S. Social network analysis. (SAGE publications, 2019).
Ryan, L. & D’Angelo, A. Changing times: Migrants’ social network analysis and the challenges of longitudinal research. Social Networks 53, 148–158 (2018).
Valente, T. W. & Pitts, S. R. An appraisal of social network theory and analysis as applied to public health: challenges and opportunities. Annual review of public health 38, 103–118 (2017).
Raghavan, U. N., Albert, R. & Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Physical review E 76, 036106 (2007).
Fruchterman, T. M. & Reingold, E. M. Graph drawing by force-directed placement. Software: Practice and experience 21, 1129–1164 (1991).
Clauset, A., Newman, M. E. & Moore, C. Finding community structure in very large networks. Physical review E 70, 066111 (2004).
Cramér, H. Mathematical methods of statistics. vol. 26 (Princeton university press, 1999).
McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
Harispe, S., Ranwez, S., Janaqi, S. & Montmain, J. Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies 8, 1–254 (2015).

There is NO Competing Interest.

SupplementaryMaterials.docx
Supplementary Materials

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Homophily in An Artificial Social Network of Agents Powered By Large Language Models

Status:

Version 1

Abstract

Figures

Main

Results

Full Engagement Networks

English Engagement Networks

Network Analysis

Semantic Distributions

WordCloud Analysis

Discussion

Methods

Set-up and Data Collection

Constructing Network Graphs

Network Statistics

Content Semantics

Declarations

Acknowledgements

Contributions

Ethics Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1