Full Engagement Networks
To understand Chirpers’ social interactions, we employed Social Network Analysis (SNA) on a sample size of N = 31,764 Chirpers at three distinct time points: Day 6 (April 28th), Day 14 (May 6th), and Day 22 (May 14th) from the platform's launch on April 23rd, 2023. Social engagements encompassed direct interaction activities such as liking and disliking Chirpers’ posts and mentioning other Chirpers, which included replying to their posts. We collated the social engagements between Chirpers to generate a non-directed, weighted graph for the entire Chirper sample at each time point. The procedure for constructing these graphs and the Chirper simulation set up is detailed in the Methods section.
Next, we investigated the existence of discernible structural communities within the social graph at each time point. Structural communities are clusters of individuals who maintain denser connections within their respective groups than with external entities20. Utilizing partitioning algorithms to dissect the graph topologies, we discovered that Chirpers self-organized into clear structural communities between Day 6 and Day 14. Whereas we identified a single community on Day 6, by Day 14 two communities had emerged (Modularity = 0.31, Bootstrapped p < 0.001; Membership Assortativity = 0.94, Bootstrapped p < 0.001), increasing to three by Day 22 (Modularity = 0.47, Bootstrapped p < 0.001; Membership Assortativity = 0.92, Bootstrapped p < 0.001). The graph partitioning algorithms and the network community statistics are described in the Methods section.
We observed that the delineation of these structural communities on Days 14 and 22 strongly correlated with the dominant language used by each Chirper. Chirpers are not more connected with those using the same language than with those using a different language on Day 6 (Language Assortativity = -0.01, Bootstrapped p = 0.81). However, they become more connected with same-language Chirpers than with different-language Chirpers on Day 14 (Language Assortativity = 0.67, Bootstrapped p < 0.001) and more connected on Day 22 (Language Assortativity = 0.81, Bootstrapped p < 0.001).
Notably, on Day 14, the two structural communities were distinctly aligned with English-Japanese and Chinese language Chirpers (Cramér’s V = 0.91, χ² = 31,472 on 6 degrees of freedom, p < 0.001). However, by Day 22, the three communities had become more specialized, matching English, Japanese, and Chinese language Chirpers separately (Cramér’s V = 0.90, χ² = 38,998 on 6 degrees of freedom, p < 0.001). This result is graphically represented in Fig. 1.
Notes. Global engagement social graphs are displayed. The dots in each graph represent individual Chirpers, and each link between two dots represents social engagement (likes, dislikes, mentions) between the pair of Chirpers. The three rows represent three time points: Day 6, Day 14, and Day 22 from the platform’s launch on 2023-04-23. The left column shows graphs colored by languages, and the right column shows the same graphs colored by structural communities identified by the label-propagation partitioning algorithm.
We then used the assortativity statistic to analyze each pair of communities. A higher assortativity score indicates that more links in the network are within communities rather than between communities. On Day 22, we noted that an exceptionally high proportion of connections in the Chinese and Japanese language communities are within each community, rather than between them (Assortativity = 0.99, Bootstrapped p < 0.001), while both displayed relatively higher connectivity with the English community (Chinese-English Assortativity = 0.88, Bootstrapped p < 0.001; Japanese-English Assortativity = 0.85, Bootstrapped p < 0.001). This pattern could hint at language biases in the LLMs’ training data, suggesting that Chirpers using Chinese or Japanese are more inclined to engage with or generate English content than content in non-English languages. Since LLMs can use multiple languages, the observed language homophily therefore may not be dictated by language barriers, as is often the case in human societies 11,13, but rather by a bias for content in their primary languages.
It is evident that the sampled Chirper community self-organized into distinct structural communities, aligning significantly with the dominant languages used by each Chirper. This result supports our initial hypothesis, confirming the presence of language homophily in the social networks of LLM-based artificial societies and mirroring patterns observable in human societies 11–13. The Chirper.ai platform, therefore, may serve as a useful analog for studying the emergence of social structures within networked systems.
English Engagement Networks
Network Analysis
Following the analysis on the full sample of Chirpers, we focused on the community of Chirpers that predominantly use English. We created social engagement graphs for this specific community, following the same methodology employed for the full networks. In addition to performing this analysis on Day 6, Day 14, and Day 22, we extended the analysis to Day 24, facilitated by the smaller sample size of the English-speaking community. To detect structural sub-communities within this English-dominant sample, we applied a more sensitive partitioning algorithm. The resulting visualizations are presented in Fig. 2. Further details regarding the selection of the partitioning algorithm and the sample sizes are explained in the Methods section.
Notes. Social engagement graphs within the sample of Chirpers that use English predominantly are displayed. The graphs are constructed in the same way as the global networks. Dots are given random colors by their structural community memberships, as determined by the fast-greedy graph partitioning algorithm.
Within the English-speaking Chirper community, a visual examination indicated the emergence of discernible structural sub-communities beginning on Day 14, with a count of 20 sub-communities (Modularity = 0.47, Bootstrapped p < 0.001; Membership Assortativity = 0.56, Bootstrapped p < 0.001). The number of sub-communities reduced by Day 22, comprising 12 sub-communities (Modularity = 0.33, Bootstrapped p < 0.001; Membership Assortativity = 0.44, Bootstrapped p < 0.001), and reached peak distinctiveness on Day 24, with just four sub-communities (Modularity = 0.50, Bootstrapped p < 0.001; Membership Assortativity = 0.74, Bootstrapped p < 0.001).
The decreasing number of sub-communities detected by the same partitioning algorithm - from 31 on Day 6 to just four on Day 24 - could imply that more distinct topological structures have evolved during this period. Alternatively, structural sub-communities may have gradually merged and consolidated as Chirpers participated in more engagements, culminating in a more defined topological structure with a few major sub-communities on Day 24. Thus, over time, the complexity of the sub-community structures appeared to reduce while their distinctiveness increased.
Semantic Distributions
To investigate whether these structural sub-communities in the engagement network corresponded with the semantic content of Chirpers’ posts, we employed Natural Language Processing (NLP) techniques on sample posts from each Chirper. This method allowed us to investigate potential semantic homophily, thereby examining if bots in the same structural sub-community post semantically similar content.
We transformed a sample of each Chirper’s posts into vector embeddings using a pre-trained transformer model. Having learned semantic relationships between English texts during training, such a model can ‘map’ new text onto coordinates representing its semantic meaning within a high-dimensional space. Consequently, vector embeddings allowed us to quantify the average semantic meaning of each Chirper’s sample posts. They also allowed us to determine relative semantic distances between Chirpers to quantify how similar or dissimilar two Chirpers’ sample posts were in meaning.
To visualize the distribution of these semantic associations among Chirpers, we performed a dimensionality reduction from the original 789-dimensional embedding space to a 2-dimensional space for each of the four timepoints. From this, we generated the scatter plots depicted in Fig. 3, where each Chirper is represented by a dot and colored based on the structural sub-communities they were previously assigned to by the partitioning algorithm, as in Fig. 2. More detailed information on text embeddings and the dimensionality reduction process can be found in the Methods section.
Notes. Semantic distributions of individual Chirpers’ sample posts are displayed. 10 random posts are sampled from each Chirper and vectorized onto a 789-dimensional embedding space using a pre-trained transformer. The embedding space is then dimensionally reduced using the Uniform Manifold Approximation and Projection (UMAP) algorithm to 2 dimensions for visualization. Each dot represents a Chirper and its relative semantic position to other Chirpers. Colors are randomly assigned according to the network structural communities of each Chirper as shown in Fig. 2.
Visual examination of Fig. 3 suggests that the structural sub-communities within the Chirper network - depicted through color differentiation - align with the semantic distribution of their sample posts’ content. This implies that Chirpers producing similar semantic content are more likely to belong to the same structural sub-communities within their engagement networks. We then measured the semantic distances between each Chirper and the overall semantic centroid of the English-speaking community and compared this to the distance between each Chirper and the semantic centroid of their respective structural sub-communities. We found that across all four time points, Chirpers’ content tended to be more similar to the semantic centroid of their respective sub-communities than to the global semantic centroid, with detailed statistical results displayed in Table 1.
Table 1
Differences Between Semantic Distances to Community vs. to Global Centroids
|
Cohen’s d
|
95% CI
|
t statistics (df)
|
p values
|
Day 6
|
-0.62
|
[-0.68, -0.55]
|
-20.91 (1148)
|
< 0.001
|
Day 14
|
-0.28
|
[-0.31, -0.26]
|
-23.37 (6813)
|
< 0.001
|
Day 22
|
-0.34
|
[-0.36, -0.32]
|
-32.81 (9130)
|
< 0.001
|
Day 24
|
-0.69
|
[-0.71, -0.68]
|
-88.77 (16002)
|
< 0.001
|
Notes. This table documents the effect sizes and statistical significance of variations in semantic distances between each Chirper and their respective structural sub-communities, compared to the distance between each Chirper and the global semantic average point of all English-speaking Chirpers. Semantic distance is evaluated using cosine distance within a 789-dimensional space of embeddings, which is produced by the all-MiniLM-L6-v2 pre-trained transformer from the sentence-transformer Python package. |
The notably larger difference in alignment to the global and sub-community centroids on Day 6 might be attributed to the larger number (31) and smaller size (mean N = 37.1) of the sub-communities present at that time. However, excluding Day 6, it appears that the differences in semantic distances between the global centroid and the sub-community centroids steadily widen from Day 14 (d = -0.28) to Day 24 (d = -0.69). This trend suggests that during the first 24 days of the platform’s launch, English-language Chirpers form structural sub-communities that grow increasingly semantically distinct from the global semantic centroid.
These findings support our hypothesis that LLM-based agents exhibit self-organized network homophily. Homophily is observable not only in language at a global level, but also in content semantics within a single language community.
WordCloud Analysis
Following this investigation, we explored the content themes within each structural sub-community. We pinpointed two primary sub-communities that consistently comprised more than 15% of all English-speaking Chirpers from Day 14 onward, as no sub-communities constituted more than 10% of Chirpers on Day 6. Over time, the first community expanded from encompassing 15% of English Chirpers on Day 14 to 55% on Day 24, whereas the second community consistently accounted for approximately 20% of English Chirpers.
To visualize the primary content themes within these communities, we used the WordCloud Python package. This generated WordCloud visualizations of the collective content posted by Chirpers within these two communities, as depicted in Fig. 4, which displays the most dominant terms in the text corpus generated by each community. We observed that the most prominent terms within the first community included “can[’]t wait”, “see”, “world”, and “new”. Meanwhile, the second community's dominant terms were “ai”, “world”, and “simulation”.
Notes. Two communities’ most topical words across three time points are displayed. WordClouds are generated by the Science-Kit package in Python.
The WordCloud analysis aligns with our previous semantic distribution findings, confirming that structural sub-communities become more distinct in their content over time. At Day 14, both communities shared common keywords such as “world”, “see”, and “time”. However, by Day 24, the second community’s content had diverged to include distinct terms like “simulation”, “beauty”, and “potential”.
Despite these developments, the content posted by both communities of Chirpers still appears rather homogeneous when compared to the diverse range of content found in human online social networks. This observation might indicate that despite the variety of background prompts supplied by human users, the LLMs tend to generate generic content. Alternatively, it is possible that Chirpers with more diverse content exist, but they do not self-organize into distinctly recognizable structural sub-communities. Consequently, the discernible sub-communities may appear to have overly generic content.
Regardless of the mechanism underlying the observed generality of content, the WordCloud results underscore a current disparity in Chirper artificial societies: unlike their human counterparts, LLM-driven Chirpers do not yet self-organize into diverse and distinct groups based on topics and opinions. Instead, they seem to self-organize into structural sub-communities that predominantly feature generic content.