Collectivism in a Replication Network of Minimal Nucleobase Sequences

. A major challenge for understanding the origins of life is to explore how replication 9 networks can engage in an evolutionary process. Herein, we shed light on this problem by 10 implementing a network constituted by two different types of extremely simple biological 11 components: the amino acid cysteine and the canonical nucleobases adenine and thymine, 12 connected through amide bonds to the cysteine amino group and oxidation of its thiol into three 13 possible disulfides. Supramolecular and kinetic analyses revealed that both self- and mutual 14 interactions between such dinucleobase compounds drive their assembly and replication 15 pathways. Those pathways involving sequence complementarity led to enhanced replication 16 rates, suggesting a potential bias for selection. The interplay of synergistic dynamics and 17 competition between replicators was then simulated in an open reactor with experimental kinetic 18 data, showing the selective amplification of different species depending on the initial mixture 19 composition. Overall, this network configuration can favor a collective adaptability to changes 20 in the availability of feedstock molecules, with disulfide exchange reactions serving as 'wires' 21 that connect the different individual auto- and cross-catalytic pathways. 22

Research on life's origins constitutes a major multidisciplinary effort to unravel the 23 physicochemical means by which living systems could emerge from non-living matter. Many 24 questions remain open in the field, with implications that are both historical (how and where life 25 originated), synthetic (how life can be synthesized from its basic molecular constituents), and 26 conceptual (what essential features of living organisms allow characterization of their 27 aliveness). [1][2][3][4][5] Systems chemistry is proving to be useful in this respect, as it adopts a holistic 28 view for the study of complex chemical systems, wherein dynamic out-of-equilibrium reaction 29 and self-assembly processes govern the system´s emergent behaviors. [6][7][8] An important line in 30 this area involves the development of chimeric systems that combine the properties of distinct 31 biological building blocks, as a step towards replication, protometabolic networks and 32 protocellular assemblies. 9-13 33 In the endeavor to mimic DNA's capacity for replication or, more generally, the capacity of 34 living cells to self-reproduce, different forms of replication have been developed with both 35 synthetic and biological molecules. [14][15][16][17][18] The literature is rich in processes that display 36 autocatalysis, either through the product's catalysis of its own formation, 19 cyclic 37 autocatalysis, 20,21 or in oscillatory reactions. 22 Most of these autocatalytic transformations 38 compartments. 32,38 This type of replication was likely widespread in prebiotic scenarios, but in 50 order to trigger subsequent evolution, replicators would have to acquire additional capacities 51 including: 39 (i) catalysis, to establish a supportive metabolism; 40 (ii) performance out of 52 equilibrium; 35,41,42 (iii) compartmentalization, to avoid parasitic reactions and dilution effects; 43 53 or (iv) variability control with nucleobase sequences. 44 Important open questions with respect to 54 nucleobase sequences are how simple can be their constituent monomers, and what is the 55 minimal sequence length that can drive the emergence of replication networks. 56 To shed light on those issues, herein we describe a new family of very simple exponential 57 replicators emerging from monomers that display adenine or thymine (A and T), connected 58 through amide bonds to the amino groups of cysteine ( Figure 1). The role of the amino acid in 59 these molecules is to link the nucleobases in a sequence, through oxidation of its reactive thiol 60 into dynamic disulfide bonds. Despite the short length of these nucleobase sequences, 61 supramolecular studies showed that they are able to control the self-assembly of the three 62 formed species (AA, TT and AT), thus determining their replication efficiency. In-depth kinetic 63 experiments and simulations were used to study how the resulting aggregates affect the 64 irreversible auto-and cross-catalytic oxidation pathways of A and T, and the concomitant 65 reversible disulfide exchange reactions. In spite of the low complexity of the studied replicators, 66 both in terms of the monomers structure (much simpler than that of ribonucleotides) and of the 67 sequence length (dimers), complementarity of nucleobases enhanced the replication rate for 68 both the auto-and cross-catalytic pathways (AT and AA/TT, respectively), suggesting an 69 adaptive potential that involves the interplay of different collective and competitive dynamic 70 interactions between them. that did not permit estimation of the interlamellar distance. In any case, the formation of 133 different assemblies for the three systems, including fibers and lamellar structures depending on 134 concentration, points to a complex assembly landscape, with contribution from hydrogen 135 bonding interactions, nucleobase π-π stacking and hydrophobic effects. The involved self-136 assembly mechanisms will be examined in-depth in subsequent studies but, since previous work 137 has demonstrated the capacity of fibrillar and sheet assemblies to facilitate replication 138 processes, 34,45 we assume these to be the catalytically active ones also in the present case.  Importantly, the change in curve slope was observed at a product concentration that autocatalysis in the absence of aggregates (below the cac). To confirm the products' 153 autocatalytic nature, seeded experiments with 30% of AA or TT were conducted, maintaining 154 the total concentration of starting materials in the same range as in the non-templated reactions. 155 In both cases, a shortening of the induction period and an overall decrease in the reaction time 156 was observed ( Figure 3C, D; while Figure S16A and S16B shows the direct comparison of 157 seeded and non-seeded experiments), indicating that they actually contribute to increase the 158 reaction rate. This effect was less prominent for TT, probably due to its lower tendency to 159 aggregate (higher cac). 160 Similar results were obtained when conducting the reactions from a mixture of A and T (2 161 mM each). Figure  AT. Experiments seeded with 20% of a previously finished reaction resulted in a shortening of 164 the induction period for the three replicating species ( Figure 3F, S16C and S16D). However, the 165 scenario becomes significantly more complex when the two nucleobases are present, as 166 disulfide exchange reactions can also occur. To study the role of these exchange processes in the 167 global network kinetics, two different reactions were performed (4 mM T + 2 mM AA and 4 168 mM A + 2 mM TT) with HPLC monitoring ( Figure S11). In the obtained kinetic curves, two 169 stages could be distinguished, the first one corresponding to a preeminent role of disulfide

h) between AA and T (G) or TT and A (H). Each panel 176
shows the evolution of all involved species over time through experimental data (square data 177 points) and fit curves, while the set of kinetic equations used for fitting are depicted in Table 1. rapidly if CT exceeds that value. An equivalence can therefore be assumed between 1/Keq and 198 the cac, an assumption that is valid for any possible aggregation mechanism. In Figure S20 The rate equations for non-catalyzed (Eq-A1 and Eq-T1) and autocatalytic regimes (Eq-A2 202 and Eq-T2) were defined in a MATLAB program (Table 1, boxes 1 and 2), establishing that 203 they operate below and above the cac, respectively. For autocatalytic processes, the mechanism 204 of catalysis was not known and so equations with different orders with respect to both A (or T) 205 and AAag (or TTag) were evaluated through fitting of the experimental data. The equilibrium 206 constants of aggregate formation were considered through equations Eq-A3 and Eq-T3. The 207 best fitting curves can be seen in Figures 3A-D (with R 2 above 0.99 in all cases), and 208 correspond to a global order of three for the autocatalytic stage (Eq-A2 and Eq-T2) and two 209 with respect to the aggregated replicator. For a complete statistical treatment of fitting errors, were obtained for a reaction order of two with respect to the replicating species (therefore 212 higher than one) points to an exponential growth. 34,47 The orders obtained in the rate equations 213 Eq-A2 and Eq-T2 actually imply that in the 'catalytic' hybrid assemblies of monomeric thiol 214 and disulfide dimer, the required ratio between both for the monomer to get activated towards 215 oxidation is of 1:2. Further studies will however be devoted to propose a solid mechanistic 216 scheme of this replication process. In any case, it is worth mentioning that the kinetic constants 217 for the catalyzed reactions were about one order of magnitude greater than the non-catalyzed 218 ones. The calculated equilibrium constants (KAAag = 1.95 mM -1 and KTTag = 1.6 mM -1 ) in turn led 219 to cac values of 0.51 and 0.63 mM, respectively, which are close to those obtained from DOSY 220

experiments. 221
A similar procedure was applied for calculation of disulfide exchange kinetic constants, 222 considering four possible reactions/equations (R-E1 to R-E4 / Eq-E1 to Eq-E4 -- Table 1, box  223 3; fitting curves in Figure 3G, H) and all previously calculated constants (kA, kT, kAc, kTc, KAAag 224 and KTTag). The constants resulting upon fitting (ke1, ke2, ke3, ke4) are approximately one order of 225 magnitude larger than those for the autocatalytic oxidation of monomeric thiols, which makes 226 their reaction rates comparable. Concerning the analysis of replication from mixtures of both 227 nucleobase monomers, the landscape of non-covalent assembly pathways is more complex than 228 for single replicators. In addition to the self-assembly of AA and TT, two other aggregate types 229 must be considered, resulting from either complementary interaction between AA and TT 230 (AATTag) or from self-pairing of AT (i.e., ATag). These aggregates enable auto-and cross-231 catalytic reactions, as AATTag can aid in reactions producing AA and TT (R-A4 and R-T4, 232 respectively - Table 1, box 4), and ATag can assist in its own formation (R-AT2) ( Table 1,  in the autocatalytic regime were used as previously calculated. Table 1. Kinetic analysis of the replication network. Boxes 1 and 2 concern the irreversible 238  This global analysis of the network kinetics revealed interesting aspects of its behavior. As 251 expected, for statistical reasons related to the number of pathways to its formation, AT is 252 produced twice as fast as AA or TT. More importantly, the auto-/cross-catalytic reactions were 253 ~4x faster, and the aggregation constants 2x higher when there is complementarity between 254 nucleobase sequences (i.e. for AT and AA/TT) than when there is not (i.e., for AA or TT).  concentration (from 0 to 4 mM), was then tested to determine which steady state would be 283 reached in each case. The result was a 3D surface ( Figure 4D) that marks the boundary between 284 initial conditions that favor SSAT (below the surface) or SSAA/TT (above the surface). 285

Eq-A2, Eq-T2, Eq-A4, Eq-T4 and
Interestingly, the threshold below which AT always gets amplified was ~1 mM; above that 286 threshold, SSAA/TT was favored except when the initial molar fraction of AT was higher than 0.8. The capacity of a dynamic system to reach two different steady states depending on the 298 initial conditions is called bistability, and in the present case it seems to be related to their 299 possible interconversion through disulfide exchange. To test this possibility, the network behavior was simulated in a hypothetical scenario where the exchange reactions were kinetically 301 frozen, artificially reducing the values of their kinetic constants by six orders of magnitude 302 compared to the experimental data (see Table S4). In that situation, the dominant species in the 303 reached steady state correlated exclusively with the replicating system (AT or AA/TT) initially 304 present ( Figure 4E and 4F). In addition, the concentration of the 'losing' replicative system 305 drops almost to extinction. The reason for this is that, in the absence of exchange reactions, the 306 dominant catalytic species gets amplified sufficiently quickly to consume all the substrates fed 307 into the reactor. This simulation thus proves the importance of disulfide exchange as wiring 308 reactions that connect the different auto-and cross-catalytic pathways, endowing the whole 309 replication network with a collectively better adaptability, as it can switch from one replicating  49 Strecker-derived chemistry 50 and HCN/cyanoacetylene 317 oligomerization reactions. 51 Although for practical reasons our synthesis of A and T was 318 performed following standard organic synthesis techniques, the chemistry of amide and 319 disulfide bond formation/exchange has been extensively studied in prebiotic contexts. 2 More 320 importantly, the molecular complexity of the replicators AA, TT and AT is significantly less 321 than that of other different replicator families reported to date, suggesting that the structural 322 requirements for chemical evolution to step into replicating species was probably not so high. 323 The present cysteine-based derivatives do not need an oligopeptide or lipid chain to drive their 324 self-assembly and replication processes. On the other hand, the smallest nucleic acid template 325 replicators previously described required a minimum sequence of 6 nucleotides. 15 As a merge of 326 both approaches, the self-assembly of AA, TT and AT must be mostly promoted by H-bond interactions between nucleobases, together with π-π stacking and hydrophobic effects. Despite 328 this makes their self-assembly weaker and the replication rates slower compared to previous 329 peptide-or lipid-based replicators, they gain a rudimentary sequence-based control of the 330 replication process. Building on such a capacity, the network described herein presents a 331 collective behaviour that can provide significant adaptability between the individual synergistic 332 and 'selfish' replication pathways, aided by exchange reactions that allow interconversion 333 between the different replicator species. 334 335

336
Complete procedures for the synthesis and characterization of the network components are 337 described in the SI. prepared and while being stirred, the corresponding percentage of seed (20/30% of cysteine) 368 from a finished reaction was added, and the reaction was kept stirring at 600 rpm and 20 °C, 369 followed by HPLC monitoring. Each experiment was repeated twice. gradient of water to acetonitrile for 15 min. The different species were identified with a single 380 quadrupole mass detector and quantified with a UV-Vis detector ( = 260 nm). 381