Molecular convolutional neural networks with DNA regulatory circuits

Complex biomolecular circuits enabled cells with intelligent behaviour to survive before neural brains evolved. Since DNA computing was first demonstrated in the mid-1990s, synthetic DNA circuits in liquid phase have been developed as computational hardware to perform neural network-like computations that harness the collective properties of complex biochemical systems. However, scaling up such DNA-based neural networks to support more powerful computation remains challenging. Here we present a systematic molecular implementation of a convolutional neural network algorithm with synthetic DNA regulatory circuits based on a simple switching gate architecture. Our DNA-based weight-sharing convolutional neural network can simultaneously implement parallel multiply–accumulate operations for 144-bit inputs and recognize patterns in up to eight categories autonomously. Further, this system can be connected with other DNA circuits to construct hierarchical networks to recognize patterns in up to 32 categories with a two-step approach: coarse classification on language (Arabic numerals, Chinese oracles, English alphabets and Greek alphabets) followed by classification into specific handwritten symbols. We also reduced the computation time from hours to minutes by using a simple cyclic freeze–thaw approach. Our DNA-based regulatory circuits are a step towards the realization of a molecular computer with high computing power and the ability to classify complex and noisy information. Artificial DNA circuits that can perform neural network-like computations have been developed, but scaling up these circuits to recognize a large number of patterns is a challenging task. Xiong, Zhu and colleagues experimentally demonstrate a convolutional neural network algorithm using a synthetic DNA-based regulatory circuit in vitro and develop a freeze–thaw approach to reduce the computation time from hours to minutes, paving the way towards more powerful biomolecular classifiers.

C ategorization, a fundamental element of thinking, is an important mechanism for rapid information processing by neural circuits in the brain [1][2][3] . Since the birth of DNA computing, which dates back to Adleman's pioneering study 4 on solving the Hamiltonian path problem by means of DNA molecules, DNA-based artificial neural circuits [5][6][7][8][9][10] to mimic such categorization functions and that can classify molecular inputs into discrete patterns have been developed. However, it remains challenging to scale up in complexity to support more powerful computational systems. Qian and Winfree et al. demonstrated a network of interacting DNA strands that can act as artificial neurons and that remembers four molecular patterns by implementing the Hopfield neural network strategy 6 . Then, Cherry and Qian scaled up the molecular pattern recognition of DNA neural networks to enable the recognition of nine molecular patterns by implementing a more powerful, winner-takes-all neural network strategy with DNA strands 8 . These results indicate that the sophistication and elegance of algorithms implemented experimentally with reactive orthogonal DNA molecules play a key role in determining the computation efficiency of DNA neural network circuitry, which is analogous to electronic circuitry 11,12 .
The convolutional neural network (ConvNet) is a powerful computational model for categorization in deep learning [13][14][15][16][17][18] , in which the connection pattern between neurons resembles the organization of the visual cortex in animals 19 . These are known as weight-sharing artificial neural networks, in which the convolution kernels can slide along input features and subsequently provide output as feature maps. Compared with fully connected networks that result in overly complex network structures, the ConvNet lies at the lower extreme on the scale of connectivity and complexity since it features a sparse topology to effectively reduce the numbers of network connections and weight parameters 15,16 , thus holding great promise to enable vastly smaller scale molecular implementations using sparser networks.
Although ConvNet algorithms enabling efficient hardware implementation in electronic computing devices and in photonic and quantum computing devices have been proposed 14,20-25 , they have not yet been demonstrated in molecular computing systems.
Here, we present a systematic strategy for the computational design of networks of reactive orthogonal DNA molecules that can implement the ConvNet algorithm. We show that such a DNA-based ConvNet can simultaneously implement multiple parallel multiplyaccumulate (MAC) operations and recognize patterns in up to eight categories after training in silico. Each pattern comprises up to 43 distinct DNA strands that are selected to trace individual handwritten symbols, chosen from a set of 144 DNA strands that represent the 144 bits in 12 × 12 patterns. By connecting upstream logic circuits that activate a specific set of weight molecules, the DNA-based ConvNet can recognize patterns in up to 32 categories by using a two-step classification approach of performing a coarse classification on type of language (Arabic numerals, Chinese oracles, English alphabets and Greek alphabets) and then classifying into specific handwritten symbols. Moreover, we show that a simple cyclic freeze-thaw approach can significantly accelerate the reactions in such large-scale DNA neural networks, which decreases the computation time from hours to minutes. Our approach leads towards the realization of molecular computers with high computing power and the ability to classify complex and noisy information.

A DNA switching gate architecture design for ConvNet circuits.
A ConvNet consists of an input layer, a convolutional layer, a nonlinear layer and an output layer. In each layer, an intermediate array of pixels, referred to as a feature map, is produced from the previous layer. Figure 1a illustrates the operating principle of the ConvNet for recognition tasks, where an n × n input symbol is convolved with a Molecular convolutional neural networks with DNA regulatory circuits Xiewei Xiong 1,3 , Tong Zhu 1,3 , Yun Zhu 1 , Mengyao Cao 1 , Jin Xiao 1 , Li Li 1 , Fei Wang 2 , Chunhai Fan 2 and Hao Pei 1 ✉ Complex biomolecular circuits enabled cells with intelligent behaviour to survive before neural brains evolved. Since DNA computing was first demonstrated in the mid-1990s, synthetic DNA circuits in liquid phase have been developed as computational hardware to perform neural network-like computations that harness the collective properties of complex biochemical systems. However, scaling up such DNA-based neural networks to support more powerful computation remains challenging. Here we present a systematic molecular implementation of a convolutional neural network algorithm with synthetic DNA regulatory circuits based on a simple switching gate architecture. Our DNA-based weight-sharing convolutional neural network can simultaneously implement parallel multiply-accumulate operations for 144-bit inputs and recognize patterns in up to eight categories autonomously. Further, this system can be connected with other DNA circuits to construct hierarchical networks to recognize patterns in up to 32 categories with a two-step approach: coarse classification on language (Arabic numerals, Chinese oracles, English alphabets and Greek alphabets) followed by classification into specific handwritten symbols. We also reduced the computation time from hours to minutes by using a simple cyclic freeze-thaw approach. Our DNA-based regulatory circuits are a step towards the realization of a molecular computer with high computing power and the ability to classify complex and noisy information.
k × k kernel function (with a stride of 1) to compute a feature map with dimensions of (n − k + 1) × (n − k + 1). When operating the ConvNet, the input symbol is grouped into (n − k + 1) 2 receptive regions (Fig. 1a, area marked with dashed blue line) with dimensions of k × k. The elements of these receptive regions share the same weights, which could enable a sparse topology to effectively reduce the number of network connections. Mathematically, a convolution operation requires multiple MAC operations with shared weights. To implement a weight-sharing MAC operation using DNA molecules, we proposed a switching gate architecture 26 . Each switching gate is associated with a gate base strand (for example, domains T*Wt*Ii*T* in Fig. 1b) that has a recognition domain (Ii* in Fig. 1b) and a weight tuning domain (Wt* in Fig. 1b) flanked by two toehold domains (T* in Fig. 1b), and these domains are functionally independent. Here and below, symbols such as T, Wt and Ii denote elementary functional domains of DNA species, whereas the same symbols with an asterisk are used to denote their complementary sequences. Varying the sequence of the recognition domain Input DNAs x n (n-1)+1 x n (n-1)+2 Fig. 1 | The ConvNet and its molecular implementation with DNA regulatory circuit systems. a, Left: The architecture of the ConvNet. Right: A schematic of the operating principle of the ConvNet for recognition tasks. The individual receptive region of the inputs is multiplied by the kernel to perform a MAC operation to output y, where we assume that the stride is 1. In this way, an input matrix can be convolved with the same kernel by mapping the convolution computation into (n − k + 1) 2 MAC operations. Note that x i (1 ≤ i ≤ n 2 ) are binary inputs and w t (1 ≤ t ≤ k 2 ) are analogue weights. The subscripts i and t represent the position of elements in the input matrix X and of the kernel matrix W, respectively. b, The implementation of the ConvNet with DNA regulatory circuit systems. Upper: The inputs can be encoded by a set of single strands, wherein 1 or 0 represent the presence or absence of input strands. Middle: The weight matrix can be stored by programming a set of weight molecules with a simple switching gate architecture. The assignment of shared weights is implemented by the sequence of weight tuning domains. Bottom: The DNA implementation of summation and subtraction. In all panels, symbols with subscripts denote DNA species (enclosed in coloured solid as well as dotted circles), whereas those without subscripts refer to the elementary functional domains (represented by coloured lines) of DNA species. The presence of an asterisk when denoting domains indicates a complementary sequence. The symbol ∑ indicates the sum over all inputs.
(Ii*) enables it to respond to different inputs, leading to different signal transmission pathways. Varying the sequence of the weight tuning domain (Wt*) determines the weights assigned to the input, whereby the same weight can be assigned to different inputs by concatenating the same weight tuning domain and different recognition domains. This architecture could allow independent control of the signal transmission functions and weight assignment functions during the computation. In this way, we can implement weight-sharing MAC operations at the molecular level and construct molecular ConvNets with reactive orthogonal DNA molecules.

DNA implementation of MAC and convolution operations.
We started the experimental demonstration with the weight multiplication function (w t x i ), in which the variables are binary inputs and the variables wt ( are analogue weights. The subscripts i and t represent the position of the elements of the input matrix X and of the kernel matrix W, respectively (Fig. 1). The positions of the matrix elements correspond to different sequences of the recognition domain (Ii) and of the weight tuning domain (Wt) in the DNA strand displacement implementation (Extended Data Fig. 1). The binary values taken by the variables x i correspond to the presence or absence of domain Ii in input strand Xi, while the values of the variables w t correspond to the concentrations of the weight tuning molecules M Wt or weight substrate molecules N Wt,Ii,j that contain the domain Wt, where the symbols with subscripts such as M Wt and N Wt,Ii,j denote DNA species. Weight multiplication is implemented by cascaded reactions (Fig. 2a) where an input species X i converts an activated weight substrate molecule N Wt,Ii,j * to an intermediate product P Ii,j . N Wt,Ii,j * is implemented when N Wt,Ii,j undergoes a spontaneous intramolecular conformational switch upon hybridization with the weight tuning molecule M Wt , allowing stoichiometric exchange of the activity of the DNA signals. In the absence of X i , no P Ii,j will be generated. In the presence of X i , the final concentration of P Ii,j is determined by the concentration of N Wt,Ii,j *, thus setting the value of the weighted multiplication (Supplementary Figs. 2-9 and Fig. 2b). Note that negative weights assigned to inputs are implemented using distinct output sequences (domain Pj in Extended Data Fig. 1) for N Wt,Ii,j . Then, one can compute the sum of the weighted inputs within the same receptive region by using the same output sequence for all the intermediate species. This is implemented with reactions wherein all the intermediate species P Ii,j stoichiometrically convert the summation gate (Sd j,k ) to common weighted-sum species Ss j,k (Fig. 2c,d). The sums of positive and negative weights can be implemented by using the concentrations of two different weighted-sum species ( Supplementary Fig. 10). To complete the summation, the positive and negative weighted sums must be subtracted from one another (Fig. 2e). Specifically, all positive weighted-sum species Ss j,k can convert the double-stranded complex Dd k,Yi to an intermediate species Ds k,Yi . All negative weighted-sum species Ss i,n generated from the upstream summation gate can bind to the toehold of the inhibitory strand In n and branch migrate to form inert waste species, producing reactive annihilation species Sub n,Yi * through intramolecular conformational switching. The subtraction can thus be realized, wherein Sub n,Yi * and Ds k,Yi annihilate each other. Only the remaining Ds k,Yi will interact with the downstream reporting gate (Fig. 2f) to read the output signal. Otherwise, the reaction is terminated. This design suggests that only positive weighted-sum species Ss j,k could activate the downstream reactions, a feature that is shared with that nonlinear activation function, known as the rectified linear unit (ReLU) used in this model.
We verified this design by using 36 different weighted-sum (S 1 and S 2 ) combinations (Fig. 2g). As expected, the signal was indeed gradually attenuated as the concentration of S 2 was increased. Note that this circuit does not implement a perfect subtraction operation. This is because a small fraction of Ss j,k could interact directly with the downstream complex before encountering Sub n,Yi * (the higher the concentration, the greater its chance of escaping from complete annihilation), similar to observations in previous work 8 . Despite this imperfection, the circuit does compute correctly for weighted sums when the concentrations of the weighted-sum species are not too close to each other and are not too high.
Subsequently, we designed a DNA-based MAC circuit to perform a two-input MAC operation ( y = w 1 x 1 + w 2 x 2 ), which is implemented by adding one summation gate to two parallel weight multiplications (Extended Data Fig. 2a,b). The circuit consists of two weight substrate molecules N Wt,Ii,j , two weight tuning molecules M Wt and one summation gate. Each N Wt,Ii,j that is activated by the corresponding M Wt has a different recognition domain (I1* and I2*) to respond to distinct inputs (X 1 and X 2 ), and each has a different weight tuning domain (W1* and W2*) to determine the weights assigned to the input. Using the same output sequence for two N Wt,Ii,j would enable two intermediate species P Ii,j to be connected to the same summation gate. We can tune the concentration of the weight tuning molecules (M W1 and M W2 ) to set different values for the weights for the corresponding multiplication reactions. As expected, the circuit exhibits stoichiometric behaviour with the output signal and the concentration of M Wt (Extended Data Fig. 2c,d). By adding one extra weight multiplication to the downstream summation gate, we also demonstrated a circuit that computes the three-input MAC function ( Supplementary Fig. 11).
We now show the molecular implementation of a simple convolution operation of a 2 × 2 input pattern with a kernel size of 2 × 1, which is equivalent to two parallel MAC operations in mathematics. Each receptive region of the input patterns x 1 and x 3 , and x 2 and x 4 is multiplied by the weights w 1 and w 2 to obtain the weighted pixels x 1 w 1 and x 3 w 2 , and x 2 w 1 and x 4 w 2 , respectively. The feature map y 1 and y 2 is then exported by summing up the weighted pixels in the same receptive region (Fig. 3a). The 2 × 2 input pattern is encoded with four DNA input strands (X 1 , X 2 , X 3 and X 4 ). The convolution kernel is encoded in the sequence of weight tuning domains (Fig.  3b, green domains W1* and W2*). To complete the multiplication with shared weights, we designed four weight substrate molecules (N W1,I1,5 and N W1,I2,6 , N W2,I3,5 and N W2,I4,6 ), two of which (for example, N W1,I1,5 and N W1,I2,6 in Supplementary Fig. 12) have the same weight tuning domain, corresponding to the pixels that interact with the same kernel in different local receptive regions, but different recognition domains at the 3′ end to connect to the downstream summation gates (Sd 5,1 and Sd 6,2 ). All the input patterns were binary patterns in which 1 or 0 represents the presence or absence of input strands. The value of the analogue weights determined from the convolution kernel is implemented with the concentrations of N Wt,Ii,j . To compute the convolution, each DNA sub-circuit runs independently and in parallel to compute the MAC operation in each receptive region ( Supplementary Fig. 13). For a specific pattern, the corresponding weight tuning molecules M Wt would activate the respective weight substrate molecules N Wt,Ii,j in two respective regions. Thus, the assignment of shared weights by the convolution kernel is implemented with the activated weight substrate molecule N Wt,Ii,j * and X i through a DNA strand displacement reaction, resulting in the release of the intermediate species P Ii,j . Two summation gates Sd j,k convert P Ii,j in the same receptive regions to weighted-sum species Ss j,k , leading to the triggering of the downstream reporting gates. For an experimental demonstration, we chose 16 input patterns, and both outputs achieved their correct 'on' or 'off ' state, indicating that the DNA circuit correctly implemented the convolution computation ( Fig. 3c and Supplementary Fig. 14). For example, with inputs X 1 X 2 X 3 X 4 = 1001, the circuit output y 1 (or y 2 ) was proportional to the product of the input concentration and the corresponding weight, as designed.

A DNA-based ConvNet for molecular pattern recognition.
Having shown that the DNA circuit is capable of processing the convolution, we next built a DNA-based ConvNet that can 'remember' two handwritten symbols, that is the Chinese oracles for 'fire' and 'earth' (Fig. 4a). The training set consists of 48,000 patterns of handwritten symbols from the Sinica oracle database. In silico, all the original symbols were converted to 144-bit binary patterns for network training by rescaling them to a 12 × 12 grid and setting each pixel to 1 when exceeding a threshold ( Supplementary Fig.  15). The convolution kernel (a 6 × 6 matrix) slides along the input patterns with a stride of 6 and subsequently generates a corresponding output feature map ( Fig. 4b and Neural network training and testing section). We evaluated the network performance on a reference dataset after training, reaching an accuracy of 97% (Fig. 4c). We implemented this ConvNet model by encoding the convolution kernel in the sequence of the weight tuning domain, and implementing the value of the weights with the concentration of the weight substrate molecule N Wt,Ii,j (Extended Data Fig. 3a). The test input binary patterns were encoded with single strands, wherein each 1 or 0 represents the presence or absence of an input strand (Fig. 4d). The DNA-based ConvNet implements pattern recognition by comparing its local feature with all the memories and identifying the most similar memory (Extended Data Fig. 3b). For example, each receptive region of a 'fire' can simultaneously interact with the same Output Output  The initial concentration of N Wt,Ii,j and X i is 2×. c, The DNA implementation of summation. d, The fluorescence kinetics of P I1,2 and P I2,2 alone and their sum (P I1,2 + P I2,2 ). Note that the final fluorescence signal is proportional to the weighted sum (given by P I1,2 + P I2,2 ). The initial concentration of Sd j,k is 2×, the concentration of P I1,2 is 0.4× and 0.2× and the concentration of P I2,2 is 0.6× and 0.8×. e, The DNA implementation of subtraction. Note that common reactants involved in the subtraction and downstream reactions, that is, toeholds with different lengths, are used here to control the reaction order 51,52 . The toehold length of Sub n,yi * and In n is 7 nt, while the toehold length of the complex Dd k,yi and the reporter Rep yi is 5 nt. f, The DNA implementation of reporting. g, The subtraction behaviour for different combinations of positive weighted-sum S 1 and negative weighted-sum S 2 . Each small graph shows the normalized signal (from 0 to 1, on the x axis) versus time (from 0-10 h, on the y axis), with colours indicating the signal intensity. The initial concentration of Dd k,yi and Sub n,yi is 1× and 2×, respectively. The standard concentration is 1× = 50 nM. DNA species are represented by coloured solid and dotted circles, whereas different domains are represented by coloured lines. The symbol ∑ indicates the sum over all inputs.
Convolution kernel Receptive regions Convolution kernel Feature map inputs

Fig. 3 | A convolution computation via multiple parallel MAC operations. a,
The detailed calculation process of the convolution with a 2 × 2 input matrix, a kernel size of 1 × 2 and a stride of 1. The feature map is generated from the interactions between the kernel function and different receptive regions. b, Top: An abstract schematic diagram of the convolution. The symbol ∑ indicates the sum over all inputs. The 2 × 2 input pattern is encoded with four DNA single strands (X 1 , X 2 , X 3 and X 4 ). The weight substrate molecule N Wt,Ii,j can be activated by the corresponding weight tuning molecule M Wt , then the activated weight substrate molecule N Wt,Ii,j * can interact with the local receptive region and export the computation results. Middle: The DNA implementation of a MAC operation in one receptive region (X 1 and X 3 ) of the input pattern. In this receptive region, two N Wt,Ii,j , one Sd j,k , one Rep yi and two M Wt were designed. Bottom: The detailed reaction pathway of a MAC operation with input X 1 in one receptive region. Arrows indicate the flows of the reactions. DNA species are represented by coloured solid and dotted circles, whereas different domains are represented by coloured lines. The full DNA strand displacement cascades implementing a convolution are shown in Supplementary Fig. 13. c, The characterization of a convolution with six different input patterns after 3 h. The value of the weights determined from the convolution kernel is 0.8× (blue dot) and 0.2× (grey dot), respectively. The blue and red histograms correspond to the outputs of y 1 and y 2 , respectively. Blue dots indicate that the weight w 1 is multiplied by the corresponding input (x 1 or x 2 ), while grey dots indicate that the weight w 2 is multiplied by the corresponding input ( kernel function to export feature maps through DNA strand displacement cascades. As the network runs, a subset of weight tuning molecules M Wt could activate the corresponding weight substrate molecule N Wt,Ii,j in four receptive regions at the same time to enable multiple weight-sharing MAC operations to be performed in parallel. This allows DNA circuits to be able to activate a specific reaction pathway in the convolution layer when exposed to a specific pattern, which can enhance the robustness of the network ( Supplementary  Fig. 16). Then, a max-pooling process (with a pooling size of 2 × 1 and stride of 1) is applied to reduce feature the size of the map by annihilating the smaller one between two pixels through the cooperative hybridization method introduced by Cherry and Qian 8 (Extended Data Fig. 3c), in which two contiguous pixels are represented by concentrations of two distinct nucleic acid sequences.
As shown in the experimental data ( Fig. 4e and Supplementary  Fig. 17), the input patterns of two handwritten symbols each triggered the desired outputs, indicating that two handwritten symbols can be classified. When each oracle pattern was rotated by an angle from 0° to 360° in steps of 30°, the circuit still yielded the desired output for all 26 test input patterns, indicating that the circuit correctly classified the rotated patterns. In total, 177-250 distinct molecules were used for all the test patterns. As expected, we showed that the DNA-based ConvNet can also remember eight 144-bit molecular patterns simultaneously ( Supplementary Figs. 18-22).
A hierarchical neural network for recognizing 32 patterns. Our ConvNet has a feature where different input weight tuning molecules can selectively activate a specific set of weights to allow the    same set of DNA molecules to be used for different tasks. The use of weight tuning molecules as outputs of the upstream circuit demonstrates the possibility of building hierarchical networks for more sophisticated categorization tasks. To validate this approach, we proposed a two-step classification approach that first uses a logic gate to perform coarse classification and then uses a ConvNet to perform finer classification. To demonstrate this approach experimentally, we chose a task of recognizing 32 handwritten symbols that can be divided into 4 groups: 8 Chinese oracles (from the Sinica oracle database), 8 Arabic numerals (from the Modified National Institute of Standards and Technology database), 8 English alphabets and 8 Greek alphabets (from the Kaggle website). In silico, we converted all the original handwritten symbols to binary patterns with two layers (Fig. 5a). Layer 1 is on a 1 × 4 grid and acts as an input for the logic circuits to perform a coarse classification on language (for example, oracle is 1000), yielding outputs that selectively activate the downstream ConvNet subnetwork to perform fine classification into specific handwritten symbols using layer 2 on a 12 × 12 input pattern. Four groups in layer 2 can be separately trained in silico with respective datasets to obtain the optimal model (Fig. 5b), thus yielding the values of four convolution kernels with dimensions of 3 × 6 (with a stride of 3 × 6) (Supplementary Fig. 23 and Neural network training and testing section). This network performed well in the reference dataset, reaching >84% accuracy in each group (Fig. 5c). We implemented a two-step classification approach experimentally by designing different switching gates to encode the four convolution kernels. Both the tags in layer 1 and the inputs in layer 2 are binary patterns, in which each 1 or 0 indicates the presence or absence of a tag strand (or an input strand), respectively (Extended Data Fig.  4a). The pattern classification can be performed by using the following steps (Fig. 5d): (1) The tag strand in layer 1 reacts with the reporter gate to generate the output signal y m , which can be recognized as the corresponding coarse category (Extended Data Fig. 4b,  c). Meanwhile, the tag strand reacts with the fan-out gate to release a set of weight tuning molecules, which can be captured from total DNA strands using magnetic beads through a hybridization reaction using biotinylated capture probes. Then, the weight tuning molecules are displaced from the beads by an invader strand to activate the downstream DNA neural networks (Extended Data Fig. 4b-f).
(2) The weight tuning molecules generated from the upstream logic circuits then activate the corresponding neural network to implement the recognition (Fig. 5e), in which each output y m is uniquely correlated to specific handwritten symbols to enable the fine classification. Two fluorescence signals are collected from layer 1 and layer 2, respectively, to determine the recognition results. For example, 'fire' is recognized if and only if y 1 = 1 and y 1 = 1 (where y 1 is the output identifying the coarse category of 'oracle' and y 1 is the output identifying the fine category of 'fire'). In total, constructing the DNA-based ConvNet that can remember 32 molecular patterns requires 368-512 distinct molecules for all the test patterns. As expected, the circuit yielded the desired pair of outputs for 32 representative example patterns with group identities (Fig. 5f and Supplementary Figs. 24 and 25). In general, with this hierarchical approach, constructing a DNA-based ConvNet that can recognize b × m distinct n-bit patterns (where b is the number of groups and m is the number of patterns in each group) with a kernel size of e bits requires n + 5m + (m + 1) × b × e molecules (Supplementary Fig. 26).
A cyclic freeze-thaw approach to accelerate the DNA circuits.
The speed of execution of DNA computing remains a challenge in large-scale DNA neural network reactions. For example, our computation of two categories took longer than 20 h ( Supplementary  Fig. 17), while the computation time increased to over 36 h for 32 categories (Supplementary Fig. 25). To accelerate the performance of these DNA circuits, we developed a simple cyclic freeze-thaw approach (Fig. 6a). The cyclic freeze-thaw approach iteratively drives the strand displacement reaction towards thermodynamic equilibrium, which can accelerate the basic strand displacement reaction by ~75 fold (Extended Data Fig. 5 and Supplementary Fig.  27). For a larger-scale circuit, 160 test patterns of 144 bits can be recognized in less than 30 min through five freeze-thaw cycles, which would otherwise require hours (Fig. 6b,c).

Conclusion
We have experimentally implemented a DNA-based ConvNet based on 512 DNA strands that can perform eight parallel weight-sharing MAC operations simultaneously. The switching gate architecture in the design of the DNA circuits allows independent control of the signal transmission functions and the weight assignment functions, which is functionally similar to the riboswitches in gene regulatory circuits, all consisting of two independent functional domains that sense and respond to external inputs. This hints at the possibility of embedding ligand-responsive molecular switches to allow biochemical circuits to adapt their functions in response to changes in the environment 27,28 . The fact that we were able to use one additional circuit to selectively activate a specific set of weights in the DNA-based ConvNet shows the potential to embed learning into biochemical circuits, in which the upstream DNA circuit updates the current approximation of the weights in supervised learning tasks [29][30][31][32] . The massively parallel feature inherent to DNA molecules could enable autonomous parallelization of convolution operations, which would be particularly well suited for more scalable information processing. By using a hierarchical network that first uses a logic gate to perform coarse classification and then a ConvNet to perform finer classification, we also demonstrated that such circuits can be scaled up to classify patterns into 32 categories. This could also provide the possibility of integrating multiple circuit architectures 8,33 to enhance the computational power. Further scaling up would exacerbate problems such as leak reactions caused by nonspecific cross interactions or spurious binding. Importantly, we have extended the key feature of ConvNets, that is, sparse topology, to a DNA neural network, effectively reducing the complexity of the network architecture by using sparsely connected neurons, which could allow more complex information processing and potentially endow molecular circuits with 'intelligent' behaviour that resembles a biological neural network. By interfacing sensory inputs [34][35][36][37] , the DNA-based ConvNet could in principle use hundreds of targets as inputs, facilitating broader applications in disease diagnostics, profiling expression patterns and precision medicine [38][39][40][41][42][43][44] .

Methods
Sequence design. Five types of molecular structure (Extended Data Fig. 1) were used in this work: the weight substrate molecules N Wt,Ii,j (1) and subtraction species Sub n,Yi (4) consist of three single strands; the summation gates Sd j,k (2) and complexes Dd k,Yi (3) consist of two single strands; the reporters Rep Yi (5) consist of single strands modified with fluorophore and quencher groups. All the DNA single strands used in this work were composed of long recognition domains and short toehold domains, except for the weight tuning domains used for weight multiplication, subtraction and reporting, which were composed of short stem domains and long loop domains. Note that these domains are functionally independent. Corresponding complementary sequences are indicated by an asterisk. On this basis, the sequence design was conducted at the domain level.
We generated several pools of domain sequences with different lengths according to a series of design heuristics 45,46 . To reduce secondary structures and undesired interactions, all the domain sequences were produced by using a three-letter code (A, C and T). To reduce synthesis errors, no more than four A's or T's, and no more than three C's were used in a row. The C content was kept at 30-70% to ensure comparable melting temperatures. For any two sequences in the pool, the longest length of the matching sequence was no more than 35% of the domain length, and all sequences were formed by at least 30% different nucleotides. The sequences of single strands were generated by directly concatenating these domains together.
Finally, a 15-nucleotide sequence pool used for the recognition domains and a 27-nucleotide sequence pool used for the weight tuning domains were generated. We checked these two sequence pools to ensure the same pairwise criteria. To reduce gate-gate leakage 46 , we used two-nucleotide clamps in all the bottom strands, being complementary to the first two nucleotides in the tail of the molecular species. We used three universal toeholds for all the DNA strands in the DNA-based neural networks, except for the DNA circuits used for the 32-pattern recognition, in which branch migration cannot be initiated by toehold domains without matching recognition domains. A Wt,Ii has six-nucleotide toeholds, composed of the five-nucleotide universal toehold and one-nucleotide extension G, and was used in the weight multiplication layers to ensure an effective strand displacement reaction rate. Sub n,Yi had seven-nucleotide toeholds, composed of a five-nucleotide universal toehold and a two-nucleotide extension that is complementary to the two nucleotides next to the toehold of the upstream complexes. All the weight tuning molecules M Wt shared a seven-nucleotide, universal toehold domain. All the other molecular complexes shared a five-nucleotide, universal toehold domain. To reduce the side reaction caused by the universal toehold binding in the DNA circuits used for the 32-pattern recognition, the toehold of M Wt was different for switching different convolution kernels. Two nucleotides 'TT' were inserted between the toehold domains and recognition domains in the input strands X i and the single strand Ds k,Yi to ensure an effective strand displacement reaction rate.
The designed DNA strands were verified by NUPACK 47 to confirm their binding energy and specificity. Note that the bottom strands of the complexes in the network are complementary to the corresponding domains and thus contain A, G and T. We also validated the correct formation of the hairpin loop structures in the presence of M Wt by using NUPACK.
To ensure the reliability of the ConvNet model and avoid overfitting, one must ensure that sufficient data are available in the training/test dataset. However, the number of images for the Chinese oracles and handwritten Greek alphabets is far from sufficient. Thus, the Augmentor software 48 was used to augment the dataset. The four expanded datasets had 1,000 different images for each character. A separate dataset was constructed for the Chinese oracles 'fire' and 'earth' . Initially, there were 1,000 images for each oracle. Then, each image was rotated 24 times with 15° per rotation. The final dataset contained 48,000 images. For each recognition task, 80.0% of the datasets were put in the training set while the rest were put in the validation set.
Each original handwritten symbol was rescaled as a greyscale image with 60 × 60 pixels by using the Pillow software (https://doi.org/10.5281/ zenodo.5394534). Pixel values in each image that exceeded a threshold were set to 1, while the rest were set to 0. The value of the threshold can be adjusted for specific circumstances. In the experiments on the recognition of 32 handwritten symbols, each input symbol was replaced by a pair of input symbols (Fig. 5a, layers 1 and 2). Binary tags were attached to layer 1 to mark coarse categories (where 1000 corresponds to Chinese oracles, 0100 corresponds to Arabic numerals, 0010 corresponds to English alphabets and 0001 corresponds to Greek alphabets). Layer 2 was kept as a 12 × 12 binary pattern to be classified at the finer level.
Identifying rotated handwritten symbols using the ConvNet model is more demanding. Here, we tested the ability of the ConvNet to recognize rotated symbols of two Chinese oracles. First, the kernel size and the stride were set as 6 × 6 and 6, respectively. After the convolution operation, we could obtain an output feature map with shape of 2 × 2. The following step was the operation of the ReLU activation function, whereby negative values are truncated to 0. The max-pooling process reduces the 2 × 2 matrix to 1 × 2 by setting the pooling size and pool stride as 2 × 1 and 1, respectively. Finally, the accuracy of this model on the training set reached 96.8%, while the recognition accuracy on the validation set reached 97.0%.
For the classification of the eight handwritten symbols, the inputs of the symbols were convolved by a kernel with a size of 3 × 6. The stride was set to 3 × 6, then the size of the first feature map should be 4 × 2. The ReLU activation function was used to zero out any value less than zero in the feature map, then the feature map with a size of 4 × 2 was flattened to matrices with size of 1 × 8. After sufficient epochs, all four models achieved or exceeded an accuracy of 85% on both the training and validation sets. In addition, the performance of the models on the training and validation sets was very similar, indicating no overfitting.
The training process was performed on the Keras platform (https://keras.io). During the model compilation process, the Adam 49 optimizer was used to compute the gradient. The learning rate was set to 0.001, and the exponential decay rates β 1 and β 2 were set to 0.9 and 0.999, respectively. The constant epsilon was set to 10 −8 , and the decay value of the learning rate was set to zero after iteration. We used the sparse categorical cross entropy as the loss function, which is a special case of the cross entropy loss function. The computational relationship is where N is the number of samples, y i is the true label and yi is the predicted label. The sparse categorical accuracy function was chosen as a metric to evaluate the performance of the current training model. The batch size was set to 150, and the number of training epochs was set to 1,000. All of the parameters were selected after a series of comparisons and tests, and the final values were chosen because they can balance the network size and predictive power.
DNA oligonucleotide synthesis. Based on the design, oligonucleotides purified by ultra-polyacrylamide gel electrophoresis (ULTRA PAGE) and oligonucleotides purified by high-performance liquid chromatography, modified with fluorophores, were provided by Sangon Biotech and used without further purification. All strands were shipped lyophilized and resuspended at 200 μM in 1× Tris-acetateethylenediaminetetraacetic acid buffer with 12.5 mM Mg 2+ at pH 8.0, and stored at 4 °C for further use.
Annealing protocol and buffer condition. All substrates were prepared for annealing at 50 μM with top and bottom strands in a 1:1:1 ratio, and all duplexes were prepared for annealing at 50 μM with top and bottom strands in a 1:1 ratio, while reporters were prepared at 50 μM with top quencher strands in 20% excess of bottom strands. The buffer for all the experiments and annealed complexes was 1× Tris-acetate-ethylenediaminetetraacetic acid with 12.5 mM Mg 2+ at pH 8.0. Complexes were annealed in a thermal cycler (Life Technologies) by heating to 95 °C for 5 min and then cooling to 20 °C at a rate of 0.1 °C per 8 s, then kept at 4 °C. The hybridized molecules were purified by 12% polyacrylamide gel electrophoresis.
Fluorescence spectroscopy. Kinetics experiments were performed with a spectrofluorometer (Fluorolog-max, Horiba) at 25 °C. The instrument allows four experiments to be run in parallel. Fluorescence kinetics data were collected every 30 s, 60 s or 10 min, depending on the overall experimental time length. A maximum of four excitation-emission pairs in one experiment can be measured in parallel. The excitation (emission) wavelengths used were 495 nm (520 nm) for dye fluorescein (FAM), 530 nm (563 nm) for dye hexachloro-fluorescein (HEX), 585 nm (605 nm) for dye carboxy-X-rhodamine (ROX) and 685 nm (705 nm) for dye cyanine 5.5 (Cy5.5) (Supplementary Fig. 1). For circuits using only one reporter, FAM was used. For circuits using two reporters, FAM and ROX were used. Before experiments, all cuvettes were successively washed with distilled water for six times, 70% ethanol for one time and distilled water for five times. In the 8-and 32-pattern recognition experiments, 8 output trajectories were recorded using four distinct fluorophores. Every experiment was conducted twice, each with half of the outputs correlated to fluorophore-labelled reporters and the other half correlated to non-fluorophore-labelled reporters. We can observe all eight outputs simultaneously by combining all the output trajectories in one plot.
Fluorescence data normalization. All raw fluorescence data were normalized to the standard concentration of the output signals. The difference in the fluorescence readout caused by the instrument was negligible. We conducted each set of parallel experiments for the same circuit with different inputs, and these experiments were normalized together for data analysis. For a given fluorophore, in the sets of parallel experiments in which at least one of the output signals increased and reached a plateau at the end of the experiment, the maximum level (output of 1) was determined by the average of the last five data points of the completion signal. In the sets of parallel experiments in which none of output signals increased all the way to completion, the maximum level (output of 1) was obtained from the highest signal produced from the reporter Rep Yi at the same time.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data and experimental protocols associated with this work are included in the Supplementary Information available in the online version of the paper. Source data are provided with this paper.

Code availability
The code for the algorithm used for the network training in this work is available on Code Ocean and GitHub at https://doi.org/10.24433/CO.3022063.v1 50 and https://github.com/tongzhugroup/DNAcode.

NAturE MACHINE INtEllIGENCE
Extended Data Fig. 1 | Five types of molecular structures used in the DNA circuits. 1, The weight substrate molecules N Wt,Ii,j consist of three single strands. The loop portion is initially hybridized with a strand B t to form rigid double helix structure, which forces the toehold and recognition domain apart, thus precluding the strand displacement. When the originally bound B t falls off, the stems would be complementary to each other to form the hairpin loop structure, which bring the recognition domain and toehold domain in close proximity, thus favoring the branch migration through the recognition domain. 2, The summation gate Sd j,k is used to sum up all upstream weighted inputs from the same receptive region. The complexes Sub n,yi and double-stranded complexes Dd k,yi were used for the subtraction (3 and 4). 3, Dd k,yi can react with upstream strands to release the intermediate species Ds k,yi . Note that Ds k,yi would interact with the reporter Rep yi with hairpin loop structure, and we added the spacer domain ('TT') to ensure the binding energy. 4, Sub n,yi consists of three single strands. In order to simplify the sequence design, we shortened the length of the inhibitory strand Inn that enable the rigid and double helix structure of Sub n,yi , to ensure that it can fully react with the upstream output strand. 5, Rep yi could convert the upstream single strand to concentration-dependent fluorescent reporting signals by toehold-mediated strand displacement. The meaning of subscript indices of complexes, which are enclosed in coloured solid circles in the figure, is listed in the table. Different functional domains are represented by coloured lines.