Density functional theory computation (DFT). DFT computations were performed using the Vienna ab initio simulation package (VASP), projector-augmented wave (PAW) pseudopotentials, and the revised Perbew-Burke-Ernzerhof (RPBE), exchange-correlation functional, with the DFT-D3 method for van der Waals correction. The plane wave cutoff was set to 600 eV. All atoms were fully relaxed until convergence of energy of 1×10− 5 eV and forces of 0.05 eV·Å−1. Spin polarization was implemented for structures involving Fe, Co and Ni. The Brillouin zone was sampled with a (4×4×1) k-point grid. See Supplementary Fig. 14 for model construction. Test computations were performed to assess the influence of temperature and solvent (Supplementary Table 1), and electrochemical working potential (Supplementary Figs. 15 and 16, Supplementary Table 2) on the C-C coupling step. Importantly, findings evidenced that that these factors had negligible impact on coupling energy. These factors were therefore ignored in construction of the big dataset.
Adsorption energy and coupling energy. The C-C coupling big data consisted of 45,738 adsorption energies (Ead) including, 6 precursors (CO, COH, CHO, CH, CH2 and CH3), and 21 C2 combinations (CO-CO, CO-COH, CO-CHO, CO-CH, CO-CH2, CO-CH3, COH-COH, COH-CHO, COH-CH, COH-CH2, COH-CH3, CHO-CHO, CHO-CH, CHO-CH2, CHO-CH3, CH-CH, CH-CH2, CH-CH3, CH2-CH2, CH2-CH3 and CH3-CH3). Ead was computed from:
E ad = Etotal - Esubstrate - Especies (1)
where Etotal is DFT total energy of the substrate with adsorbate, Esubstrate the energy of corresponding pristine substrate and Especies energy of adsorbate species referenced to CO2, H2O and H2 molecules.
To describe ‘difficulty’ of the coupling reaction, the coupling energy (Ecplg) is defined as the difference between the total energy following coupling, Etotal(Product), and the total energy of the two precursors, Etotal(Precursor1) and Etotal(Precursor2), before coupling, namely:
E cplg = [Etotal(Product) + Esubstrate] - [Etotal(Precursor1) + Etotal(Precursor2)] (2)
By substituting Ead for Product, Precusor1, and Precusor2 into Eq. (2) and simplifying the following is obtained:
E cplg = Ead(Product) - [Ead(Precursor1) + Ead(Precursor2)] (3)
DFT calculations were used for the verification of the overall reaction pathway on selected catalyst surfaces with additional computation for intermediates C, CHOH, CH2O, and CH2OH.
Enumerating search space. Substrates: The active site comprises a tri-atom consisting of ABCu, where A and B can be substituted by 27 metals each, forming therefore 322 possible combinations. Adsorption sites: A total of 121 surface structures were constructed for the tri-atom active site. The hollow position of the active site accommodates adsorbates CO, COH, CH, CH2, C, CHOH and CH2O, each with a single adsorption configuration. On the bridge position, CH3 adsorbs with 3 different configurations. For symmetric adsorbates CO-CO, COH-COH, CHO-CHO, CH-CH, CH2-CH2 and CH3-CH3, one C adsorbs on the bridge site whilst the other adsorbs on the top site, resulting in 3 adsorption configurations. Similarly, for asymmetric adsorbates CO-COH, CO-CHO, CO-CH, CO-CH2, CO-CH3, COH-CHO, COH-CH, COH-CH2, COH-CH3, CHO-CH, CHO-CH2, CHO-CH3, CH-CH2, CH-CH3, CH2-CH3, CHO and CH2OH, one C or O adsorbs on the bridge site and the other C or O adsorbs on the top site, resulting in 6 adsorption configurations for each. In addition, the pristine substrate when there is no adsorbate also needs to be calculated.
Multiplication of 378 substrate compositions and 121 adsorption configurations gives 45,738 data in the set. The construction of these models and simulations was carried out in the atomic simulation environment (ASE) package.
Graph neural network training and prediction. A number of graph neural network (GNN) models were used to learn, and to predict adsorption energy for the big C-C coupling dataset. The training-set consisted of 2,500 adsorption energies obtained via iterative sampling, whilst the prediction set used all 45,738 adsorption energies. Features included the initial structure of each adsorption species, as depicted in Supplementary Fig. 4, in addition to ‘fingerprint’ corresponding to the atoms in the respective structures.
It is commonly reported that information from multiple sources including, images, texts and time-series data can be used to boost predictive performance of a model. We hypothesized that building GNN models based solely 3D information could be extended to incorporate other data sources to improve further performance.
To evaluate performance of the models and using 2D-3D ensemble information, SchNet, DimeNet++, GIN, GraphSAGE and GCN were assessed. A combination of DimeNet + + and GCN was used for to build the big data.
SchNet
SchNet31,32 is a ‘deep-learning’ model designed to efficiently capture quantum interactions within molecular systems. It treats atoms as nodes and 2-body terms as edges and uses message passing and interaction terms to model the continuous spatial representation of atomic environments, enabling accurate predictions of molecular properties including, energy and force(s).
DimeNet++
DimeNet + + 33,34 is an extension of the DimeNet, optimized to predict molecular properties. In contrast to SchNet, it treats 2-body terms as nodes and 3-body terms as edge. DimeNet + + incorporates interatomic distances and angles to boost representational capacity.
GIN
Graph Isomorphism Network (GIN) is designed to predict graph-level properties or labels. In GIN used here, the edges are constructed based on bond connectivity of the nodes. GIN has reportedly been applied widely including, social network analyses and bioinformatics.
GraphSAGE
Graph Sample and Aggregator (GraphSAGE) is a scalable, inductive-learning framework that enables efficient representation learning on large graphs. Via sampling and aggregating node-level features from local neighborhoods, GraphSAGE captures structural information of graphs.
GCN
Graph Convolutional Network (GCN)35 is a fundamental graph-based deep-learning model. It introduces graph convolutions to iteratively update node representations by aggregating information from neighboring nodes.
To leverage the complementary strengths of 3D-based and 2D-based GNN, we devised a 2D-3D ensemble model that integrated predictions from both modalities. This ensemble boosted predictive performance and robustness (a model’s ability to maintain its performance and generalization capabilities across diverse data sets) in predictions.
Chemicals. Copper (II) nitrate hexahydrate (Cu(NO3)2.3H2O, AR), silver (I) nitrate (AgNO3, AR), ammonium niobate(V) oxalate hydrate (C2H5NNbO4, AR), potassium hydroxide (KOH, AR), sodium hydroxide (NaOH, AR) and ascorbic acid (AA, AR) were purchased from Sigma-Aldrich. Carbon dioxide (CO2, 99.999%) was purchased from BOC Gas (Australia). All chemicals were used as received without further modification. Water (18 MΩ*cm) used in experiment was prepared via passing through an ultra-pure purification system.
Preparation of catalysts. The CuAgNb catalyst was prepared via electroreduction of pre-synthesized Cu2O-AgNb catalysts on carbon paper during CO2RR as follows: 0.5 mL NaOH solution (1 M), 0.5 mL Cu(NO3)2 solution (0.1 M) and 0.025 mL C2H5NNbO4 (0.03M) were added to a 20 mL vial with 9 mL water under vigorous stirring for 5 min at room temperature. 18 mg AA was added to the vial under stirring. Following stirring for 30 min, 0.1 mL AgNO3 solution (0.01M) was added to the vial, and stirring continued for 30 min. Resulting products were obtained via centrifugation, washed with ethanol and dispersed in 0.8 mL ethanol. The ethanol solutions were mixed with 20 µL 5 m/m% Nafion solution and sprayed using an airbrush onto a carbon paper. The loading mass of catalyst on carbon paper was controlled to ca. 0.5 mg cm− 2. Carbon paper sprayed with pre-synthesized Cu2O-AgNb catalysts was used as working electrode, and reduced at a current density of 200 mA cm− 2 during CO2RR to obtain the CuAgNb catalyst. Preparation of Cu and CuAg catalysts was similar however without addition of corresponding precursors AgNO3 or C2H5NNbO4 during pre-synthesis.
Characterizations. High-angle annular dark-field imaging and EDS mapping were determined on a FEI Titan Themis 80–200 operating at 200 kV. XRD data were determined on a Rigaku MiniFlex 600 X-Ray Diffractometer. SEM images were collected using a FEI QUANTA 450 FEG Environmental SEM OPERATING at 10 kV.
Electro-coupling performance tests. Electroreduction of CO2 was tested in a microfluidics flow cell that consisted of two electrolyte chambers, and one gas chamber. An anion exchange membrane (Fumasep FAB-PK-130) was placed between the two electrolyte chambers for products separation and ionic conduction. Catalyst-deposited carbon paper (YSL-30T), micro Ag/AgCl electrode (4.0 M KCl) and Ni-foam (0.5 mm thickness), respectively, were working electrode, reference electrode and anode. The working electrode (active area: 0.5 x 2 cm− 2) was placed between gas and catholyte chambers to ensure gaseous CO2 diffusion and reaction at the catholyte/catalyst interface. An electrochemical workstation (CHI760) with a current amplifier was used to perform the CO2RR test. 1 M KOH (20 mL) was circulated through the electrolyte chambers under a constant flow of 20 mL min–1 via peristaltic pumping. CO2 was supplied into the gas chambers via a mass flow controller at a constant flow of 30 mL min–1. Reactions were tested via chronopotentiometry at differing current for 1 h without iR correction. Gaseous products were analyzed via GC equipped with a PLOT MolSieve 5A column and a Q-bond PLOT column. Liquid products were characterized via HPLC (Thermo Scientific RefractoMax 520) with a Bio-Rad Aminex HPX-87H column. All potentials were converted against RHE and iR correction based on, ERHE = Eversus Ag/AgCl + 0.059 × pH + 0.210 + 0.85 × iR, where i is the current at each applied potential and R the equivalent series resistance measured via electrochemical impedance spectroscopy in the frequency. FE for reaction products was computed from, FE = eF × n/Q = eF × n/(I × t), where e is the number of electrons transferred, F Faraday constant, Q charge, I current, t running time and n total of product (mole).