Towards a Robotic Scientist for Synthesis of Nanocrystals

Data-driven materials synthesis is heralded as a new paradigm to substitute for trial-and-error experiments and labor-intensive tasks by human scientists. Herein, a Robotic Scientist platform that can deliver unprecedented performance for rational design, controllable synthesis, and retrosynthesis of nanocrystals is described. By taking advantage of interdisciplinary elds including articial intelligence, robotic automation, and big data, the Robotic Scientist platform is trained to synthesize Au nanocrystals. Existing knowledge and machine learning models are integrated into the rational design process. Controllable synthesis is achieved by synergistic coupling of robot-assisted synthesis on the macro-scale and nanocrystal growth on the nano-scale. By means of the Robotic Scientist platform, over 2,300 samples are synthesized in conjunction with in-situ characterization to accomplish the complete task of design-synthesis-retrosynthesis. The platform and methodology of Robotic Scientist pave the way for digital synthesis of nanocrystals and facilitate the paradigm shift to data-driven materials synthesis.


Introduction
Data-driven materials synthesis is heralded as a new paradigm to transfer labor-intensive tasks and trialand-error experiments from human scientists to robotic chemists 1 or chemical synthesis machines 2 . The advanced Human-AI-Robot collaboration system is accelerating the interdisciplinary revolution of materials synthesis towards a Robotic Scientist for automated synthesis. In this emerging eld, it is necessary to converge chemical knowledge, theoretical models, purpose-oriented database, programmable cyber systems, as well as robotic physical systems. One of the promising missions is digital synthesis of materials 3 by acquiring knowledge progressively, unveiling data linkages e ciently, and developing solutions constructively over time based on previous iterations.
In the past decade, tremendous efforts have been devoted to developing digital manufacturing/synthesis of materials. In particular, layer-by-layer digital additive manufacturing of threedimensional materials has been developed on the macro-scale 4 . On the micro-scale, synthetic biology is another benchmark for digital synthesis of biomaterials utilizing cells as the hardware and genes as the software 5 . Recently, there has been rapid development in organic programming language 6 and automated platforms 2,7 for organic synthesis on a small scale. At the same time, a robotic chemist has been reported to search for photocatalysts 1 , thus opening up the opportunity for robot-assisted inorganic materials investigation on the nano-scale. However, there are still many limitations hampering automated synthesis, for example, materials search without theoretical models 1 , blind materials optimization without scienti c methodologies 1 , as well as lack of synergy between hardware and software to achieve materials innovation 2 . Herein, we show how these di culties can be tackled by the Robotic Scientist platform that enables rational design, controllable synthesis, and retrosynthesis of nanocrystals as a proof of concept.

Results
Robotic Scientist platform. The platform towards a Robotic Scientist for digital synthesis nanocrystals involves convergence of the materials databases, cyber systems, and physical systems (Fig. 1).
To accomplish rational design of nanocrystals, the software and AI algorithms are integrated into the cyber system. In addition, process automation by means of a simulated operation system is utilized to pre-examine and monitor the designed synthesis procedures. In the physical system, crystal growth on the nano-scale is accomplished by automatic synthesis and characterization is performed on the macro-scale to guide controllable synthesis. Concurrently, Robotic Execution Excel (REE) les are designed to provide preliminary instructions for the execution of automatic synthesis using crucial parameters. The database is expanded continuously by the design and controllable synthesis processes. Furthermore, the relationship between the target nanocrystal morphologies (as outputs) and key synthesis parameters in the database (as inputs) are identi ed to provide constructive guidance to achieve retrosynthesis. Finally, the close loop combining rational design, controllable synthesis, and retrosynthesis provides the unprecedented ability to manipulate the morphologies of nanocrystals. It is expected that the Robotic Scientist can be trained for digital synthesis of customized nanocrystals with the essential capacities similar to those provided by human scientists.
In manual synthesis, the tasks are normally time consuming and error prone and moreover, the raw precursors expire or degrade shortly after preparation in some cases. In order to achieve automatic synthesis in a timely fashion, the Robotic Scientist platform is set up with many desirable features as shown in Fig. 2. The robot, robotic arms, digital pipettors, mobile camera, and microplate reader are connected to a series of modules that are capable of performing robot-assisted high-throughput synthesis and in-situ characterization. The photograph, schematic representation, and operation video of the platform are presented in Fig. 2a, Fig. 2b, and SI, respectively. The Robotic Scientist platform is expected to revolutionize traditional synthesis processes that rely on well-trained scientists and technicians.
Rational design. Traditionally, the manual chemical and materials synthesis processes differ slightly from person to person and sometimes introduce inadvertent errors/bias leading to diverse outcome. Moreover, it typically takes several months and even years for a scientist to acquire the required repertoire of synthetic knowledge. Hence, there is a substantial demand to conduct rational design on the Robotic Scientist platform while leveraging the expertise of human scientists. Here, crystal informatics, existing knowledge about synthesis, thermodynamic models, and kinetic models as data-driven scienti c hypotheses are integrated into the Robotic Scientist for rational design of nanocrystals (Fig. 3).
Firstly, a crystal database with over 90,000 different crystal facets from seven crystal systems is incorporated into the Robotic Scientist based on our previous research 8 . The typical morphologies in the cubic system are identi ed in Fig. 3a and Fig. 3b. The morphology information is then digitally converted to the fractional surface area (FSA) and aspect ratio (AR) and the correlations are analyzed as shown in Fig. S1, in which the FSAs of the (001) and (00 − 1 ) planes versus AR of the corresponding nanorods are identi ed revealing a gradually decreasing trend (Fig. 3a). Afterwards, by exploiting the advantages of the arti cial neural network (ANN) model to understand the complex morphology evolution process, the relationship between the crystal equilibrium morphology (FSA and AR related) and surface energy ratio is established using a well-trained ANN model (Fig. S2) based on the crystal informatics database.
To train the Robotic Scientist, Au nanocrystals synthesis knowledge with key parameters is extracted from 1,300 related literatures by data mining with the aid of the Automated Literature Recommendation System 9 . Fig. S3 shows the frequency distribution of the synthesis parameters reported in the literatures and Fig. 3c indicates that L2 is the most frequently used concentrations. Hence, by taking advantage of data mining, the Robotic Scientist is initially trained to capture synthesis parameters and the identi ed parameters are then adopted by the Robotic Scientist platform to re ne predictions. For example, longitudinal surface plasmon resonance (LSPR) can be characterized in-situ by the Robotic Scientist platform ( Fig. S4 and Table S1) and some of the samples are characterized by ex-situ TEM (Fig. S5) Fig. 3d). Therefore, with the assistance of ML and thermodynamic models, the relationship among morphology, surface energy, LSPR, and [Ag + ] concentration is established by the Robotic Scientist platform.
Establishment of the thermodynamic model allows the Robotic Scientist platform to realize rational design of desirable nanocrystals using the concentration of synthesis parameters as the input, surface energy and LSPR as the bridge, and nanocrystal morphology as the output.
The kinetics in nanocrystals synthesis is another key model in rational design that can train the Robotic Scientist for tailoring morphology. In this respect, a microplate reader and color-ultra-sensitive camera are employed to monitor the UV-Vis-NIR absorption spectra and color changes during nanocrystal growth. The dynamic-state and steady-state optical absorption spectra are displayed in Figs. S7-S10 together with representative results in Fig. 3f-3h for different C(HCl). The dynamic UV-Vis-NIR absorption spectra with peaks of LSPR and transverse surface plasmon resonance (TSPR) are identi ed in Fig. 3f. The normalized OD LSPR change with time is shown in Fig. 3g, which indicates the pseudo-rst-order kinetics (derived in the Method section and shown in Fig. S7 and Table S2). A similar trend showing the color change (RGB values) with time is presented in Fig. 3h and Fig. S7. These in-situ characterization results are employed to establish the nanocrystal kinetic models. Hence, the Robotic Scientist is guided by the thermodynamic and kinetic models with ML trained models to explore controllable synthesis and retrosynthesis.
Controllable synthesis. The complexity of materials synthesis increases exponentially with the number of variables, thereby sti ing full exploration of the materials space. The key to controllable synthesis process is convergence of macro-scale automatic synthesis and nano-scale crystal growth to bridge the synthesis parameters (as input) and corresponding morphologies (as output) on the Robotic Scientist platform. In order to achieve this objective, data-intensive rational design and automated synthesis are integrated. Meanwhile, machine learning and experimental data are utilized to construct models based on the appropriate synthesis variables. As a result, orthogonal, single-, double-, and triple-factor experiments can be conducted systematically in the order of iterations to construct the database for effective training of ML models.
Firstly, orthogonal experiments are conducted by executing materials synthesis with parameters by data mining from 1,300 papers ( Fig. 3c). They are designed to address the limitations of blind optimization for all the factors at different levels 1 . The design of experiments with different factors and levels (Table S3), UV-Vis-NIR absorption results (Fig. S11), and multivariate analysis of the variance (Table S4-S5) are presented. Based on the experimental conditions from the high-dimensional experimental space, the initial optimized levels are decided for further single-factor study.
To analyze the potentials in 1D space, 24 levels are studied for each single factor (Table S6) Tables S7-S8 and Fig. S13. All the results can be tted well with the ML predicted models, which are beyond the capacity of the classical model (merely tted with the results of AgNO 3 factor) in Fig. 3d. Primarily, there is a border range of AR tuned by CTAB, AgNO 3 , and HCl (compared with Au Seeds, AA, and HAuCl 4 ), which are identi ed and de ned as structure-directing agents (SDAs) [11][12][13][14] . The different types of the SDAs can be used as triggers on the macro-scale to control the surface energy during nanocrystal growth on the nano-scale. For example, the factor of AgNO 3 can be adjusted by the Robotic Scientist platform to change the AR values of the nanocrystals in Fig. 4a. Therefore, the relationship between the SDAs-based synthesis parameters (inputs) and nanocrystal morphologies (outputs) is identi ed as the key to achieving controllable synthesis.
To train a sophisticated Robotic Scientist, double-factor experiments are conducted for two identi ed SDAs from the single-factor experiments. In this way, the chemical space is expanded into the 2D response surface with an 8×8 grid (64 experiments) compared to 1D curves derived from single-factor experiments. Based on 64 preliminary experiments, 96 experimental conditions are generated by a normal distribution mathematical array for active training of the ML model. The design of the double-factor experiments and ML predicted models are presented in Tables S9-S13 and the results are illustrated in Figs. S14-S16. The robust double-factor ML models are then trained with two inputs for morphological control. It is found that CTAB and AgNO 3 play dominant roles and there are noticeable interactions (Fig.   4d), which are consistent with the observation that CTAB and Ag + form a face-speci c capping agent to achieve cooperated morphological control 15 . Interestingly, the CTAB and HCl factors exhibit similar behavior of the cooperated morphology control (Fig. 4e). However, there is only additive behavior for the AgNO 3 and HCl factors (Fig. 4f) and AgNO 3 plays a leading role in the two-factor experiment. A complex response pro le is created for the three-factor experiments by adjusting three SDAs. The design of triplefactor experiments and ML predicted models are shown in Tables S14-S16. The visualized response of AR to the three factors is presented in Fig. 4g and Fig. S17. Therefore, the function of the SDAs' parameters as inputs and AR features as outputs can be established for controllable synthesis of the nanocrystals in a free 3D space.
At the same time, the color features as potential outputs can be investigated by the Robotic Scientist platform. The results from single-factor, double-factor, and triple-factor experiments are shown in Figs. S18-S20, respectively and the corresponding LSPR and RGB values are listed in Tables S17-S19 for ML training. The trained ML model and comparison between experimental and ML predicted values (Table   S20 and Fig. S21), in which a satisfactory ML model with an R 2 of 0.94, are obtained. As shown in Fig. 4h, the color results as another large-sample data-set match the spectra well. In this way, the Robotic Scientist can be trained to digitally recognize colors, thus contributing to the materials genome database with color features.
Finally, with the aid of the Robotic Scientist platform, over 2,300 samples are synthesized together with in situ characterization to build up the Au nanocrystals genome (various morphologies with LSPR from 600 to 1,000 nm) (Fig. 4i). It is estimated that this task would have taken a human scientist up to four months (18 samples per day) in comparison with less than one week (384 samples on four 96-well microplates per day) taken by the Robotic Scientist. The Robotic Scientist continues to improve by receiving training with expanding experimental data and ML predicted data to realize the ultimate goal of an intelligent system for digital nanocrystal synthesis and potential of retrosynthesis based on the data sources as described in the next section.
Retrosynthesis. The Robotic Scientist is further developed with the intention of retrosynthesis based on the learned knowledge from controllable synthesis. The Au nanocrystals genome plays a vital role in supporting a closed-loop synthesis process. The genome with typical LSPR from 600 to 1,000 nm displayed in Fig. 5a consists of experimental data, ML predictable data, and TEM validation results (Fig.  5b). Building such a genome within a six-variable experimental space seems like an impossible task with the manual approach due to the experimental complexity that scales exponentially with the number of variables 1 . The relationship between the identi ed SDAs and morphologies is illustrated as 'Input' and 'Output' in Fig. 5a, respectively. By normalizing different parameters of SDAs to form different nanocrystal morphologies with the trigger of the surface energy on the nano-scale, precise morphological control is accomplished. It is constructed for effective retrosynthesis (Fig. 5c) and e cient scale-up production of Au nanocrystals (Fig. 5d and 5e) to facilitate digital synthesis of Au nanocrystals.
Retrosynthesis and optimization are the Robotic Scientist's creative endeavors. The data of the target Au nanocrystals (such as LSPR as 808 ± 10 nm, 780 ± 10 nm, and 633 ± 10 nm), which are commonly used in biotechnology and information technology (for example, HIV drug delivery 16 , surface-enhanced Raman scattering 17 , wireless neuromodulation 18 , and sensing 19,20 ), are extracted from the genome for further retrosynthesis study. Using 808 ± 10 nm as an example, 99 samples are selected from previous single-, double-, and three-factor experiments as shown in Fig. 5c and Table S21. At the same time, by focusing on the best samples, additional samples are predicted by the ML models. Afterwards, optimization experiments are executed by the Robotic Scientist platform. It is generally accepted that a larger OD ratio (OD LSPR /OD TSPR ) and narrower FWHM (at xed LSPR) represent more uniform morphology. Hence, the experiments are designed with a decision plate to optimize the target nanocrystals with higher shape uniformity by evaluating the OD ratio and FWHM of samples from different synthesis routes (Fig. 5c and Table S22). Finally, the best samples in the decision plate with the best quality are recommended for the scale-up experiments.
Three scale-up experiments are conducted sequentially, i.e., high-throughput microplate assay on the Robotic Scientist platform (in Fig. 5d), bench-scale test on a magnetic stirrer, and pilot-scale test in an agitated vessel (in Fig. 5e). Firstly, 2 mL-(on 12-well microplate), 4 mL-(on 6-well microplate), 20 mL-and 40 mL-(on single-well plate) scale experiments are performed on the Robotic Scientist platform for 633, 780, and 808 nm samples synthesis. During the scale-up process, an interesting feature in retrosynthesis is that LSPR gradually red-shifts compared to results in the nanocrystal genome (Fig. 5d), which provides new insights into the scaling law. By taking advantage of kinetics study, SDAs (such as HCl) is identi ed as the effective input to play the minor modi cation role in the scale-up process. A slight decrease of c(HCl) adjusts the LSPR according to the established scaling law. Modi cation by adjusting c(HCl) is proven to be applicable and then a pilot-scale experiment (15 L) is demonstrated in an agitated vessel (Fig. S22). Therefore, this study reveals retrosynthesis and scale-up methodology by taking advantage of the Au nanocrystals genome and kinetics study on the Robotic Scientist platform, which is expected to have broad applications in the production of similar nanocrystals.

Discussion
Training scientists with the required knowledge takes considerable resources and different chemical and materials synthesis routes may lead to diverse outcomes even for trained personnel. Moreover, most of inorganic synthesis involves trial-and-error and laborious tasks with unavoidable unintentional errors/bias. The Robotic Scientist platform described here demonstrates a notable advancement of automation pertaining to the synthesis of nanocrystals and presents an essential step towards datadriven materials synthesis. The sophisticated close loop involving rational design, controllable synthesis, and retrosynthesis is achieved by converging of the Robotic Scientist-assisted synthesis on the macroscale and nanocrystal growth on the nano-scale. The existing chemical knowledge based on data mining, thermodynamic and kinetic models, as well as ML models are combined to accelerate rational design of the nanocrystal's morphology with initial hypotheses. To avoid blind materials optimization, orthogonal experiments, and single-, double-, and triple-factor experiments are conducted systematically in iterations and then the database is constructed for effective training of the ML models to enable controllable synthesis of nanocrystals. In these processes, the accessible large data-set (in-situ characterized UV-Vis-NIR absorption spectra and RGB color results) and small data-set (ex-situ TEM validation) are generated to establish the Au nanocrystals genome and interpretation of the genome plays a vital role in supporting the retrosynthesis process. It is demonstrated that the Robotic Scientist can be trained like a human scientist for retrosynthesis and scale-up synthesis of the targeted Au nanocrystals. This work focuses on establishing the closed-loop (design-synthesis-retrosynthesis) of automation in nanocrystal synthesis using the Robotic Scientist platform. Although the complete Robotic Scientist is an ambitious objective, the prototype is a good start towards a Robotic Scientist with the essential capabilities of scienti c hypotheses, experiments by synergizing the hardware and software components, and result interpretation. It is believed that future efforts will close the gap with eventual automation of all aspects of nanocrystals synthesis. Although the Robotic Scientist is only demonstrated for Au nanocrystals in this work, the insights gained reveal the possibility of automation to accelerate data-driven materials innovation on the nano-scale.

Methods
Operation of the Robotic Scientist platform. The operating video of the Robotic Scientist platform with features is provided in the supplementary information. An illustration of the experimental preparation (sample storage and consumable intelligent management, a mobile robot for microplate transport at central line, a synthesis platform for in-situ sampling, three automatic pipettors for liquid handling, shake module for integrating operation, and a robotic arm for commercial equipment service in right circle) and experimental characterization (microplate reader for characterizing of UV-Vis-NIR absorption spectra and color-ultra-sensitive mobile camera for in-situ color characterization) is presented.
Data mining of synthesis parameters. The parameters involved in Au nanocrystal synthesis are recommended by our recently developed Automated Literature Recommendation System, a software package that can read scienti c paper with Chemical Named Entity Recognition 9 , expressions and grammatical structures 21 , and some special rules in the nanomaterials research eld. Using computers to read and digest reported works of many research groups, we were able to found the statistically representative synthesis parameters from 1300 relevant journal papers downloaded from publishers such as Springer Nature, ACS Publicaiton, RSC Publishing, Wiley, Science, and Science Direct Elesvier. From the plotted frequency distribution maps, we extracted the most frequently used parameters for designing experiments on the Robotic Scientist platform.
Thermodynamic models. To achieve rational design of the morphology, thermodynamic models are derived by integration of Wulff construction 24 , Gibbs adsorption isotherm 25,26 , and Langmuir adsorption model 27,28 as follows. The relationship of the surface area of equilibrium morphology, surface energy, and concentration of reagent is investigated.
where O hkl is the surface area, and γ hkl is the surface energy (energy required to create a surface of the unit area parallel to the (hkl) plane of the crystal). The surface energy of the (hkl) surface is proportional to the distance from the crystal's centre to the corresponding surface: The equilibrium morphology of Wulff construction relies on the surface energy ratio. However, direct measurement of the surface energy remains challenging. In this work, the FSAs are used as the input parameters and ANN to acquire the surface energy. It should be pointed out that the absolute value of the surface energy cannot be obtained from the equilibrium morphology and only the surface energy ratio can be determined. By selecting a reference surface such as {h 0 k 0 l 0 }, equation (1) can be written as ΔG = γ h 0 k 0 l 0 ∑ hkl A hkl γ hkl /γ h 0 k 0 l 0 3 where γ h 0 k 0 l 0 is the surface energy of {h 0 k 0 l 0 } facet, A hkl is the fractional surface area, and A hkl = O hkl /O total , O total is the total surface area. The thermodynamic relationship between the concentration c and the surface adsorption excess Γ are described by Gibbs adsorption isotherm 19,20 : where T is the temperature, R is the universal gas constant, and σ is the surface tension. The surface excess is usually evaluated based on the Langmuir adsorption model 12,21 : where Γ s saturated adsorption and α the Langmuir constant. Combining equation (4) and equation (5)  ( ) 9 where, AR ∞ is the nal AR, AR m is the maximum AR in the growth process, t m is the time corresponding to the maximum AR, and τ is the decay rate of the AR. Based on the results in Fig. 3g and Fig. S7, a rstorder reaction 30 can be expressed as: 10 Where the optical concentration of [Au + ] is extracted from UV-Vis-NIR spectra and expressed in the form of OD [Au+] , r is the reaction rate, and k is the reaction rate constant. The integrated rate law for the pseudorst-order reaction can then be obtained by:     showing the experimental database, decision plate and the evaluation criteria for optimization. d, Sequential scale-up of representative nanocrystals with parameters for minor modi cation study (LSPR, located within 633 ± 10 nm, 780 ± 10 nm, and 808 ± 10 nm, 1 mL on a 96-well microplate, 2 mL on a 12well microplate, 4 mL on a 6-well microplate, 20 and 40 mL on one plate). e, Bench-scale experiments on a magnetic stirrer (200 and 1,000 mL), and pilot-scale experiment in an agitated vessel (15 L).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.