Evolution of the Earth Microbial Co-occurrence Network


 Background: Co-occurrence pattern provides vital insight into complex microbial interactions of microbiomes. Although network analysis offers useful tools for describing microbial co-occurrence pattern, evolution of co-occurrence networks remains largely uncharacterized. Here, we simulated the evolution of the Earth microbial co-occurrence network and estimated topological fitness of its nodes based on the degree growth exponent.Results: We showed that the Earth microbial co-occurrence network evolved following Bianconi-Barabasi model. The Earth microbial co-occurrence network had reached to a stable status with around 500 nodes. Degree growth exponent was the major determinant of accumulated degree of taxa. The positive correlation between topological fitness and gene numbers in corresponding genomes suggests the intrinsic feature of topological fitness. The gamma distribution of topological fitness suggests the extinction of taxa with low topological fitness. We then examined the impact of node extinction and decay, finding that the link acquisition of hub nodes was not affected.Conclusions: This study glimpses the evolution feature of Earth microbial co-occurrence network and provides a framework for predicting potential hubs in the evolving network in future.

ecological processes governing community structure, such as niche filtering and habitat 48 preference [4]. 49 50 Microbial interaction networks evolve simultaneously with the evolution of microbial taxa 51 through time as they are the product of changing taxonomic composition and metabolism [5]. 52 Given that microbial interactions lie at the heart of microbial macroevolution [6], microbial 53 interaction networks might change in predictable ways during macroevolutionary timescales 54 [7]. While we know that microbial interaction networks have particular structural features, the 55 process of network change during evolution is not well known. Even though historical data on 56 microbial interactions in the fossil record are seldom available, phylogenetic trees generated 57 from marker genes might give insight into how microbial interactions change over 58 macroevolutionary timescales [8]. To access how a microbial interaction network evolves 59 using phylogenetic trees, we can track evolution dynamics by integrating microbial interaction 60 network evolution with phylogenetic tree speciation events (Fig. 1). Although a short timescale 61 study may not provide strong hypotheses testing to link microbial interactions with 62 macroevolution, the evolution of microbial networks over a large spatial scale that has 63 experienced long-term evolution provides a novel perspective on the evolution of microbial 64 biodiversity. 65 66 Network evolution scenarios such as preferential attachment have been proposed to explain the 67 emergence of hub nodes and a scale-free degree distribution in networks, with the Biaconi-68 Barabási model suggesting that fitness plays a critical role in network evolution and topology 69 [9] [10]. Node topological fitness represents the intrinsic qualities of a node that influence the 70 rate at which it acquires links. In many real-world networks, such as those that describe 71 webpages, social interactions, and academic article citations, it has been shown that a high 72 degree of connectivity is determined by high node topological fitness, even if the node arrives 73 late to the network [11][12][13]

Simulation of the evolution of Earth microbial co-occurrence network 114
To simulate Earth microbial co-occurrence network evolution, we first built a phylogenetic 115 tree for the ESVs involved in the network. The corresponding 16S rRNA gene segment 116 sequences were retrieved from EMP and aligned using MAFFT [15]. A maximum likelihood 117 (ML) phylogenetic tree was estimated with RAxML [16]. Node ages in the tree were dated 118 using the mean path lengths method function chronoMPL of ape package [17]. Network 119 evolution was then simulated by merging nodes according to the divergent sequences of the 120 phylogenetic tree (Fig. 1). Parent node links were determined as the intersection of two 121 descendant node links. This process was repeated until the network combined into a single 122 node representing the root ancestor of the phylogenetic tree. After each step, we estimated the 123 degree exponent (l), average separation (d), and clustering coefficient (C) for the evolved 124 network. 125 126

Topological fitness estimation 127
Microbial co-occurrence network evolution was described by the Bianconi-Barabási model 128 [10]; this model assumes that the formation of microbial interactions is driven by the product 129 of a taxon's topological fitness η and its degree k. This fitness is a taxon's capability to form 130 interaction relationships with other taxa. In each time step, a new taxon j with fitness ηj and m 131 interaction links joins the network. The probability that an interaction link of new taxon j 132 directed to taxon i is proportional to the production of taxon i's fitness ηi and its degree ki: 133 The dependence of Πi on ki and ηi implies that the taxa with more interactions and higher 135 fitness are more likely to build new interactions. Hence, this model assumes that even a new 136 taxon with only a few interactions can build interactions rapidly if it acquires better fitness 7 during evolution. A taxon's fitness η is proportional to its degree growth exponent β(η), a 138 parameter indicating the increase rate of link numbers, which is different to biological growth 139 rate of microbial taxon such that, 140 Accordingly, the fitness of a taxon can be reflected by estimating the degree growth exponent 144 where m is initial degree, t is the age of network, and ti is the time of node i was added to the 147 network. From Eq. 3 we get the solution of β(ηi): 148

Genome size dataset 151
We downloaded the genome size data from the NCBI Refseq genome database 152 (https://www.ncbi.nlm.nih.gov/refseq) and calculated the mean genome size for each genus. 153 154

Impact of extinction on the fitness estimation 155
The fitness model was extended to account for node deletion to evaluate the impact of 156 extinction on fitness estimations [11]. At each time step in this model, a randomly selected 157 node (with probability c) and its links were deleted. The degree growth exponent β is a 158 function of fitness η: 159 where A is given by: 161 and ηmax is the maximum fitness in the system and e is negligibly small. Thus, the degree 163 growth exponent βdecay is given by: 164 The impact of the extinction on the fitness estimation was tested for c between 0.0 and 0.9 167 using intervals of 0.1. 168 169

Impact of fitness aging on the fitness estimation 170
Microbial co-occurrence network taxa continue to age as their fitness fades with time due to 171 the spread of novel functional genes in the microbial community through horizontal gene 172 transfer. This aging process is captured in the fitness estimation by a tunable parameter v 173 [18], which governs the dependence of attachment probability on the taxon's age: 174 From Eq. 4 and Eq. 8 we get the solution of βdecay: 176 When v < 0, a new taxon will prefer to interact with older taxa. In the extreme case of ®∞, 178 a new taxon will only interact with the oldest taxon. In contrast, when v > 0, a new taxon will 179 prefer to interact with younger taxa. When v > 1, the aging effect overcomes the role of 180 preferential attachment, leading to the loss of the scale-free property. When v = 0, the aging 181 effect is absent. The impact of aging on the fitness estimation was tested by setting v at -1, 0.

Earth microbial co-occurrence network evolved following Bianconi-Barabasi model 186
To reveal the association pattern of the Earth microbial co-occurrence network, we firstly 187 compared its degree distribution with that of a random network generated based on Erdos-188 Renyi model and with scale-free networks generated based on Barabasi-Albert or  Barabasi models. The degree distribution of the Earth microbial co-occurrence network was 190 most similar with the scale-free network simulated with the Bianconi-Barabasi model (Fig. 2). 191

192
We then simulated the evolution of the Earth microbial co-occurrence network (Movie S1) by 193 merging two descendant nodes into a parent node based on the divergence order of the 194 phylogenetic tree. The network was synchronously evolved in different environments (Fig. S1) 195 and taxonomic groups (Fig. S2). The degree exponent (l), an indicator for the scale-free 196 property of the network, decreased fast from 6 to 3 and sluggishly converged to a stationary 197 value around 2 during simulated evolution (Fig. 3). Average separation (d), denoting the 198 average length of the shortest path over all pairs of nodes, characterizes network 199 interconnectedness. Stationary d values after approximately 500 steps suggest a stationary 200 small-world property of the network (Fig. 3). The clustering coefficient (C), a measure of the 201 degree to which the nodes in a graph tend to cluster together, also exhibited a decreasing 202 tendency and converged to a stationary value around 0.45 after approximately 500 steps (Fig.  203 3). 204 205

Degree growth exponent estimation 206
The Bianconi-Barabasi model assumes that nodes have a different probability of forming 207 interactions, which we refer to as "topological fitness" (η). Given that node topological fitness 208 is related to degree growth exponent (β) by a linear transformation with constant coefficients 209 (see Eq. 3 in the Methods), we assessed topological fitness by estimating β from Eq. 4. The β 210 values of the 50 nodes with the largest degree demonstrate that the degree increased at various 211 rates, supporting that the Earth network evolution follows the Bianconi-Barabasi model. The 212 distribution of β (Fig. 4a) shows that it follows a gamma distribution, indicating that 213 topological fitness of microbial taxa varies in a relatively narrow range and that nodes with 214 high fitness and low fitness are rare. The accumulated degree (kt) increased with β (Fig. 4b), 215 suggesting that node topological fitness determines its degree in the microbial co-occurrence 216 network, representing the intrinsic node qualities that influence the rate at which nodes acquire 217 links (for instance, the ease of pathway loss in response to leaky metabolite production). 218 Moreover, β values were not affected by node age (Fig. 4c), which also demonstrates that the 219 network does not follow the Barabasi-Alberta model. 220

Impact of node extinction on topological fitness 232
To address the impacts of extinction of microbial taxa when estimating β, we employed the 233 node deletion model (Eq. 7), which considers the node turnover rate (c). It is clear from this 234 model that the impact of extinction was strong on small β values. Taxa became extinct when 235 their topological fitness was lower than a threshold; these threshold values increased with the 236 turnover rate (Fig. 5a). 237 238

Impact of node decay on topological fitness 239
To illustrate the impact of the aging effect on node topological fitness, we introduced a decay 240 rate (v) into the model (Eq. 9). The fitness decreased with time when v > 0, and vice versa. 241 Given that there is a scale-free feature and decaying fitness in this network, we can predict that 242 The v values were decreasingly skewed when 0 < v < 1, especially for smaller β 243 values. The skewness level was increased by tuning the v value from 0.2 to 1 (Fig. 5b). The second assumption deserves further exploration, while the last assumption is likely 253 unrealistic given the fast emergence of ecological interactions in response to altered conditions 254 [19]. In addition, the phylogenetic tree and microbial co-occurrence network are likely of low 255 accuracy as they are both inferred. Regardless, the simulation is not meant to recapitulate the 256 evolution of the Earth microbial co-occurrence network, but rather to investigate how evolved 257 network properties relate to different assumptions about network evolution. The comparison of 258 different evolutionary scenarios helps to understand how these network properties arise. In the 259 future, more realistic evolutionary scenarios will be tested and the resulting network properties 260 compared to observed properties; the simple model applied here is a first step in this direction. 261

262
Although the Earth microbial co-occurrence network evolved into a complex system consisting 263 of 2,928 nodes in our simulations, its topological features were stationary after evolving into a 264 network consisting of approximately 500 nodes. The decreasing λ values during the initial 265 stages, from 6 to 2, suggests that the metacommunity scale-free network evolved from a 266 random network without network hubs [20]. We speculate that early microbial communities 267 did not yet have many interactions or that they were unable to form hubs, but given the 268 methodological limitations, we cannot investigate this question further. If extinct branches 269 were able to be included, we expect that λ would stabilize later and decrease more gradually. 270 The sluggishly decreasing tendency of λ values after λ < 3 indicates that the network hubs 271 attracted more links during simulated network evolution as kmax ~ N1/λ-1 in a scale-free network 272 [18]. Since λ converged to 2, the hubs tend to associate with a large proportion of nodes and 273 the network tends to reduce the distance between any two nodes. This relationship also explains 274 why the average distance was maintained at a low level when λ reached a stationary stage. Note 275 though that λ values should not be lower than 2 as self-loops and multiple links were not 276 counted [18]. 277 Average degree increased with β in the Earth microbial co-occurrence network. However, there 279 was a large degree of variation for nodes with the same β which was caused by different node 280 emergence times. Small differences can escalate over time and a distinction in degree for nodes 281 with slight β differentiations can became quite significant into the future. The gamma 282 distribution of β suggests that both high and low topological fitness nodes were rare and that 283 the β values for most nodes were ~1. Rare nodes with high topological fitness were expected 284 due to the scale free feature of the Earth microbial co-occurrence network. Simultaneously, the 285 decreasing number of nodes with fitness when β < 1 indicates that there are few nodes with 286 small topological fitness values. One explanation for the deficiency in nodes with small fitness 287 values is that lower fitness leads to node deletion from the network as microbial species cannot 288 compensate for a loss of associations by gaining new associations. We examined the impact of 289 extinction on fitness estimation and proved that the impact of extinction indeed increased as 290 fitness decreased. Another possible explanation is that rare biospheres were not included in the 291 present Earth microbial co-occurrence network; however, although rare biospheres might play 292 a critical role in microbial communities [21], the topological fitness of rare biospheres is 293 expected to be small due to their limited chances of interacting with other taxa. 294 295 Our model does not provide evidence that topological fitness has genetic roots or that it is 296 linked to fitness in the evolutionary sense [7]. However, if such a link exists, natural selection 297 would enhance topological fitness through mutations, genetic recombination, gene loss and 298 horizontal gene transfer events [5]. Accordingly, the gene profile of a microbial genome might 299 describe the essential aspects of topological fitness in an evolving microbial ecological 300 network. However, we assume that gene loss is a rare interaction mechanism, and that 301 topological fitness is reduced over time by vertical transmission of interaction mechanisms to 302 descendants, leading to a "dilution" of topological fitness [22]. Topological fitness decay with 303 node age shows an aging effect (v > 0) of nodes in the Earth microbial co-occurrence network; 304 gradual aging homogenizes the network by impeding the older hubs, which are less likely to 305 form new links. However, given the merging strategy used, an aging effect is the only possible 306 outcome when v > 0 as older nodes can only have higher fitness than younger nodes when v < 307 0. Similar to the impact of extinction, the impact of aging also increased as topological fitness 308 decreased. In this case, nodes with small topological fitness values failed to persist in the 309 network when the topological fitness was smaller than a threshold value. This phenomenon is 310 also supported by the distribution of β as many nodes are absent when β < 1. Note though that 311 the definition and measurement of topological fitness in a microbial ecological network is not 312 straightforward; many interaction mechanisms have not yet been deciphered and genomic 313 information for most microbial species is unavailable due to a failure to culture the dominant 314 microbial species in many environments [23]. There remains further knowledge gaps in the 315 association between functional gene profiles and these networks. 316

317
We have not inferred the timing of divergence events because bacteria and archaea lack 318 sufficient evidence (such as a comprehensive fossil record) to calibrate the phylogenetic tree. 319 Whole genome sequences provide a more reliable resource to dating than the molecular clock, 320 but a greater effort is needed to sequence the genomes in uncultured microbial "dark matter". 321 Other life on Earth (for example viruses, plants, animals, and fungi) is also essential to the 322 microbial interactome due to their influences on microbial environments, and we anticipate 323 that novel studies within the EMP framework will fill the gaps for microbial eukaryotes and 324 viruses. Given the increasing recognition of the value of communal microbial biodiversity 325 monitoring and the current global advance in sequencing efforts, we can infer in the future an 326 ever more comprehensive global microbial co-occurrence network from the rapid accumulation 327 of microbial sequence data. 328 329

Conclusions 330
The extent of our understanding of microbial communities is reflected in our ability to predict 331 future trends. Given the vital role of microbial interaction in community assembly, predicting 332 potential interaction features and tendency is important for microbial community 333 controllability. In this study, we simulated the evolution of the Earth microbial co-occurrence 334 network following divergent sequences in the phylogenetic tree generated from EMP datasets, 335 and assessed topological fitness of nodes in the network by estimating the degree growth 336 exponent. Our study provides a new framework for predicting potential tendency of evolving 337     Table S1. Degree growth exponent (β) of dominant genera. 383 Figure S1. The divergence order of nodes from different environments. 384 Figure S2. The divergence order of nodes from different taxonomic groups. 385 Movie S1. Evolving Earth co-occurrence network. The green nodes and links highlight the 386 latest 20 nodes and associated links. 387     Figure S1. The divergence order of nodes from different environments. 466 t 0 t 2928 Divergence order  Please see the Manuscript PDF le for the complete gure caption