The formose reaction is a prebiotically-plausible model system of sugar-forming reactions using formaldehyde as a C1 building block (Fig. 1a).20–22 It broadly consists of five reactions: enolate formation/protonation, aldol addition, retro-aldol and Cannizzaro reactions (Scheme S1). Much of its core reactivity is catalysed by hydroxide and divalent metal ions such as Ca2+.23 Conceptually, any given monosaccharide or enol(ate) compound in the formose reaction may be converted into another via application of the aforementioned reaction types. Therefore, a range of feedstock monosaccharides may be used to initiate the reaction.
Unconstrained, recursive application of the limited set of reaction classes operative in the formose reaction produces a so-called combinatorial explosion of compounds (Fig. 1a). A number of studies have explored means to contain the potential generation of intractable mixtures of compounds formed in the formose reaction using thermodynamic constraints.24–26 However, relatively little data has been collected for comprehensively rationalising the formose reaction under out-of-equilibrium conditions.22,26–30 Such conditions are a key characteristic of living systems and of high relevance on prebiotic Earth,12 upon which the conditions were dynamic and modulated on a variety of timescales. In out-of-equilibrium chemical reactions, kinetic, rather than thermodynamic, properties govern the reaction behavior and product distribution.11 Therefore, molecular reactivity is a prime controller of such systems.
Here, out-of-equilibrium conditions were induced in the formose reaction using flow conditions in a continuous stirred-tank reactor (CSTR, Fig. 1B). The compositional and reaction connectivity responses of this model system were studied in response to variations in an overarching environment. The environment, a collection of 17 input variables, included concentration variations of formaldehyde, CaCl2 and NaOH as well as temperature and the nature of initiating sugar (glycolaldehyde, dihydroxyacetone, erythrulose or ribose, Extended Data 1–4 and Supplementary Information 2).
Experiments were performed to measure steady-state equilbrium compositions of the formose reaction. In addition, investigations were also performed in which the input concentrations of initiating sugars were modulated sinusoidally. Measuring the transfer of input modulation to product compounds (Extended Data 4) provided a handle on which estimations of the underlying reaction pathways of the formose reaction could be based.16–18
The composition of the CSTR was continually sampled from its outflow. Following appropriate derivatisation,31–33 samples were analysed by GC-MS and HPLC (Fig. 1c; see Materials and Methods for full details). Analysis of the chromatographic peaks and mass spectra provided a compositional pattern for each sample (Extended Data 3, five examples are shown in Fig. 2a), comprising of varying amounts of the 52 compounds detected within the data set.
To visualise the relationships between the average compositions and kinetic signatures generated for each condition, hierarchical clustering was performed using a correlation-based pairwise dissimilarity metric (Materials and Methods). The resulting dendrogram (depicted qualitatively in Fig. 2b, see also Figure S2) represents the relative relationships between reaction outcomes. Pie plots placed on the 'leaf' positions represent normalised average product distributions. Longer paths between leaves represent lower similarity. The inset panels indicate how the values of key environmental parameters vary across the branches of the dendrogram.
Each branch of the dendrogram arises as a result of a dominant environmental factor. Fine-tuning of compositions within branches result from the mixing of additional condition variables. For instance, branch I is characterised by relatively low concentrations of formaldehyde (1, ≤ 50 mM) and inputs combining glycolaldehyde (2) dihydroxyacetone (3) and erythrulose (9). Its constituent compositions are distinguished by relatively high amounts of C6 compounds (green sector hues). Following branch I from its tip towards the center of the tree, more diverse sets of compounds are produced, including both branched and linear C4 and C5 compounds (denoted by hues of red). Interestingly, varying the concentration of 1 (with fixed inputs 3, CaCl2 and NaOH of 50, 15 and 30 mM, respectively) results in a series of compositional transitions (Fig. 2c), manifesting as a series of ‘jumps’ of varying magnitude across the dendrogram (Fig. 2d). Beginning in branch I ( ≤ 50 mM), the compositions consist of mostly of an α-hydroxymethyl-aldohexose (32) and an α, β-(hydroxymethyl)-aldotetrose (37). Within the series of experiments with  ≤ 50 mM, the composition remains in branch I but the molecular diversity increases as the  increases. The contributions of 32 and 37 in the reaction mixture decrease in favor of the generation of α-hydroxymethyl-glyceraldehyde (7), 9 and ribulose (20). Increasing  to 50 mM results in a jump towards the center of the tree, suggesting the beginning of a more significant compositional transition. Compounds 7, 20 and an α-hydroxymethyl-aldotetrose (14) become prominent. Further increasing the  to above 50 mM results in a significant compositional transition from branch I to branch VII, highlighting a transition in the molecular complexity of the system. Threose (10), lyxose (18), two 3-ketohexoses (29, 30) and a new α-hydroxymethyl-aldopentose (34) are added to the composition. Compounds 32 and 37 are almost completely depleted. Thus, the concentration of feedstock molecule 1 controls a thresholded compositional transition whose dynamic range is in the region  = 0-100 mM.
Ca2+ and hydroxide are involved in several reaction types of the formose reaction. Ca(OH)2 has been noted to have greater activity in catalysing the formose reaction than Sr2+ or Ba2+ hydroxides.34 Fe(OH)3 has also been reported to be active in catalysing formose reactions.35 Though definitive characterisation is not yet available, it is generally understood that Ca2+ binds divalently to enolates,34,36 and may participate in organising intermediates in aldol addition reactions. The principal role of hydroxide is in α-proton abstraction and enolate formation from monosaccharides, which has been noted to be a key rate-limiting step in the reactions of this class of compounds.37–39 Hydroxide is also involved as a reactant in Cannizzaro reactions.
A range of [Ca2+]:[NaOH] input ratios (remaining below the solubility limit of Ca(OH)2) were explored in the data set maintaining fixed concentrations of 1 and 3. A demonstrative subset of the data (Fig. 2e) crosses three branches of the dendrogram (II, III and VII, Fig. 2f). Beginning at low [Ca2+]:[NaOH] (in branch VII), compositions similar to those previously described in the high  regime are found. Increasing the [Ca2+]:[NaOH] ratio induces a jump from branch VII to II due to increases in the relative proportions of 9 and 18 in comparison to 7. Further increasing the ratio lowers the population of 7 and 18, eventually creating a composition dominated by 9. Notably, compositions recorded for [NaOH] = 2.5 mM and [Ca2+] = 20 to 52 mM are compositionally very similar, mainly dominated by 9. Furthermore, at concentration of [CaCl2] = 52 mM and [NaOH] = 20 mM, erythrose (8) and 20 are particularly prominent.
Other environmental conditions such as varying the initiator sugar identity lead to distinctive compositions. Branches IV, V and VI result from using 9, ribose (19) and the dimer of 2, respectively as initiators. When the temperature in the reactor from 10 to 40°C (branch VII), the reaction composition remained remarkably constant. At 10°C there is a relatively lower concentration of 10. There is also a slight divergence of the concentrations of 18 and 20 with increasing temperature. Therefore, the influence of temperature on the steady state composition of the formose reaction is modest in the range investigated.
The observed compositional variations are a direct result of the translation of the input conditions through the underlying formose reactions. To elucidate the structure of the self-organised reaction networks, we exploited the principles of the transfer of input modulation to different compounds in the network,16–19 rule-based pathway reaction network generation40–42 and graph searching.43 The framework provides a direct translation of the experimental data into a descriptive set of reactions responsible for the compositions observed (see Materials and Methods for details). Briefly, a reaction network was generated by recursively applying a set of reaction rules to an initial set of compounds (1, the dimer of 2 and NaOH). This network was used as a basis for a set of pathway searches. It has been shown that periodic perturbations applied to reactants in reactions systems are transferred to products and co-reactants.16–19 The amplitudes (here, modulations in compound concentration) measured in response to the periodic input decrease as a function of connectivity with respect to the input.
Figure 3a shows a series of representative timecourse measurements that follow compound concentration changes in response to a modulated input of 3. The search procedure was used to estimate the self-organisational response of the formose reaction pathways in response to increasing formaldehyde concentration (initiated by 3, Fig. 3b). When the  is 5 mM, the operative reaction pathways are mainly accounted for by a small set of reactions between C3 species to form C6 compounds (green pathways). Increasing  to 10 mM, triggers expansion of the repertoire of reactions. The number of possible pathways for formaldehyde addition (red) increases, with a corresponding increase in the number of proton transfer pathways (black). However, 1-based chain growth pathways do not appear to completely account for the observed behavior. Although compound 14 (a branched aldopentose) has a lower amplitude than 9, consistent with chain growth via formaldehyde addition, 20 (ribulose) and 18 (lyxose) have stronger couplings to the input modulation than expected (Fig. 2a). There is a very low concentration of 12 and 13 (3-ketopentoses), which would be required in the formaldehyde growth pathway towards 2-ketopentoses and aldopentoses. For these reasons, we attribute the production of 20 to the reaction between 2 and the enolate of 3 and 4 (blue pathways). Further increasing  (> 50 mM) builds C1 growth pathways from the previously described set of pathways, producing 29 and 30 as a product of the formaldehyde addition to enolates derived from 18 and 20.
The pathway searching protocol was applied to experimental data for which input modulation was applied, affording lists of reactions across the set of conditions. These reactions describe how each product distribution was created from the carbon-containing inputs. Each reaction was assigned a class. Re-casting the lists of reactions as counts of each reaction class provides a condensed view of how formose reactivity adapts to environmental conditions (Fig. 3c).
Following the branches of the dendrogram traversed in Fig. 2d (variation in ), reveals key reaction characteristics which govern the various reaction outcomes. A significant feature of branch I is the relatively low proportion of formaldehyde aldol addition reactions. The majority of the reactivity is accounted for by monosaccharide-enolate reactions between C3 compounds, which are responsible for creating products 32 and 37 (Fig. 4a, branch I).38,44 Moving to branch VII (higher ), the repertoire of reactions is expanded, and aldol addition reactions involving 1 are added to the network. In particular, reactions in which the α-carbon is bound to a hydrogen or glycol group are promoted. A range of protonation/ deprotonation reactions are promoted in branch VII. Deprotonation is favored at less sterically hindered positions (where the α-carbon is bound to a hydrogen or methoxy group, e.g. follow the sequence 3, 9, 12, 20, 29 in Fig. 4a, branch VII). Protonation is favored at α-carbons bound to methoxy groups. Interestingly, the amount of monosaccharide-enolate reactions also increases, suggesting that some monosaccharide products interact with other members of the network as reactants.
The formose reaction reorganises in a different manner when the [Ca2+] and [NaOH] are varied. At high [Ca2+] (52 mM) and low [NaOH] (2.5 mM) a limited set of pathways is formed, the majority of which may be accounted for via formaldehyde addition and proton transfer reactions terminating at 14 via 9 (the pathway connecting 3, 9 and 14 in Fig. 4b). Interestingly, the linear C5 12 and 13 are not formed in significant quantities, so the reaction hits the ‘dead end’ branched 14. However, the system unexpectedly avoids the formation the branched C4 compound 7 under these conditions. Decreasing the [Ca2+]:[NaOH] ratio (35 mM:10 mM) leads to a more significant contribution of formaldehyde-controlled pathways. The branched C4 7 is created, whilst the branched C5 14 is demoted and at the same time the population of C5 species and instances of Cannizzaro reactions are increased. As the ratio is further decreased, the conditions and reaction pathways begin to resemble those found for the high  region, as described above.
In contrast to the prevailing views of the formose reaction, our data indicate that formaldehyde-based chain growth pathways do not completely account for the observed behavior. Rather, reactions between C2 and C3 compounds are key chain-building reactions.20,46,47 Surprisingly, we observe the emergence of a self-organised cyclic set of reactions that explain how the C2 monosaccharide 2 must be created from retroaldol reactions, as described in Breslow's proposed mechanism for autocatalysis in the formose reaction (Fig. 4c).45 Though usually seen as an autocatalytic mechanism our results show how it can contribute to formose reactivity as a generator of 2, 3 and 4 (and their enolates) as building blocks embedded in a set of pathways in which chain growth occurs via formaldehyde addition. As such, the Breslow cycle can be envisaged as a source of new reaction pathways through which monosaccharides may be built. These reactions between the formose reaction products are an excellent example of how underlying patterns in chemical reactivity define reaction outcomes. Thus, we propose that reinforcement of molecular diversity in the formose reaction does not necessarily occur via promotion of an initiating species (glycolaldehyde). Rather, diversity may be promoted by the activation of a class of reactions in which longer carbon chains are synthesised from building blocks with units of greater than one carbon.