Parallel Genetic Algorithm Interface II: A novel computational tool for accelerated simulation-based optimization

9 The ever increasing power of computational tools encouraged by the general need for development of more 10 sustainable technologies fuels the interest in modern optimization approaches. While simulation-based 11 optimization has been receiving considerable attention in the past decades, it still struggles to overcome 12 some challenges, namely excessive computation time. This study proposes a novel optimization interface, 13 the Parallel Genetic Algorithm Interface II (PAGAN-II), which utilizes parallelization of flowsheet 14 simulations to drastically reduce the optimization time without the need to use clustered CPUs and/or 15 modified optimization algorithms. Results of a detailed performance study showed up to 2100% increase in 16 computation rate when optimizing demanding process flowsheets; and approximately 300% increase when 17 optimizing simple ones. Capabilities of the proposed interface were demonstrated by optimization of a 5 18 MTPA C3MR LNG technology processing 12 different feedstocks, where a 15–30% decrease in the specific 19 energy consumption was achieved. At the same time, the algorithm increased the optimization speed 13-20 fold compared to the traditional approach. This translates into a reduction of optimization time from 69 days 21 of non-stop computation to approximately 7 days. 22

parallelization concept works.No comments are given either on the computation time and it remains unknown how to obtain this tool.

PAGAN: Parallel Genetic Algorithm Interface
In our previous work, we presented a novel approach to the issue: Parallel Genetic Algorithm Interface (PAGAN) [28].The PAGAN algorithm, coded in Matlab programming language, uses ActiveX Automation Server which is a standard way for an external application to interact with Aspen Plus [43].However, contrary to the traditional approach, the PAGAN algorithm uses: 1. vectorization of the fitness function, i.e., calling of the fitness function on the entire population at once [44], 2. asynchronous running of the Aspen plus simulations, i.e., exploitation of the fact that any number of Aspen Plus simulations can run simultaneously (provided that they have a different name) and the algorithm (Matlab) does not wait for a running simulation to finish and continues to the next step.To accelerate the calculation, the population is divided into predefined number of groups, N. Next, N Aspen Plus simulations are initiated and each group of individuals is assigned to one simulation.Thus, a population of G individuals is simultaneously evaluated by N Aspen plus simulations as described in Fig. 1.This way, the number of calculation cycles per generation is reduced N-fold which drastically reduces the computation time.
The PAGAN algorithm was compiled into a user-friendly graphical user interface (GUI) and provided as a freeware [28].Because of the simplistic nature of the interface, it can be used by virtually any user with minimal prior experience in optimization using GA/NSGA-II.It was successfully applied for optimization of a reactive distillation column design by Šulgan et al. [45].Fig. 1.Comparison of traditional approach and PAGAN algorithm [28].

Contribution of this work
As was mentioned above, simulation-based optimization suffers from the need of a great number of repetitive simulations and the fact that software linking may prove challenging.To make the simulationbased optimization fast and available for anyone in the academia and/or industry, a novel optimization interface, the Parallel Genetic Algorithm Interface II (PAGAN-II), was developed.PAGAN-II is built on the foundations of the original PAGAN software but employs an improved optimization engine.This engine allows the algorithm to run the parallel simulations more effectively compared to PAGAN.As a result, PAGAN-II is expected to be faster and more stable compared to the previous version.Finally, thanks to its simple GUI, it is expected to be particularly useful for less experienced users.

Methodology 2.1 PAGAN-II interface description
PAGAN-II is a Matlab-coded algorithm and interface which uses the Aspen Plus ActiveX Automation Server to interact with Aspen Plus simulations and GA/NSGA-II algorithms for optimization.A simplified layout of the algorithm is displayed in Fig. 2. To accelerate the optimization, PAGAN-II creates and a predefined number of copies of the desired simulation (simulation engines).During the optimization, the entire population is sent to the optimization engine at once.The optimization engine assigns each simulation engine an individual, and the simulation engines start evaluating these individuals simultaneously.The optimization engine then continuously scans all simulation engines for their activity.Once a simulation engine becomes idle, i.e., the simulation run is completed, results of the simulation are collected, and another individual is assigned to the simulation engine.This way, a true parallelization is achieved without the need of using clustered computers or a server, and the optimization time is greatly reduced.Furthermore, it is generally known that continuous running of a software gradually increases the respective load on the RAM memory and overwhelms the processors' cache memory.This phenomenon causes a decrease in computational speed.To battle this issue, PAGAN-II periodically reinitializes the simulation engines to diminish the effect of gradual optimization slow down.The actual script of the algorithm of the optimization engine is provided in the Appendix.The PAGAN-II optimization interface is provided in the Supplementary material.Fig. 2. PAGAN-II optimization engine.

Advantages of the PAGAN-II algorithm
Obviously, individual simulation runs can take different time, mainly because some simulation runs do not converge well or do not converge at all.Because the original PAGAN works in cycles, the slowest-working simulation engine determines the optimization speed.However, because PAGAN-II employs a series of independent simulation engines, the problem is eliminated.To put things into perspective, one can imagine the difference in performance between the standard approach, PAGAN and PAGAN-II with the following example (Fig. 3).Three groups of students are given six assignments each of which take different time to finish (for demonstration suppose the simulation times are 3, 3, 12, 3, 6, and 3 seconds, respectively).The first group consists of only one student, because only one simulation engine is employed in the standard approach.The other two groups (PAGAN and PAGAN-II) consist of three students (i.e., three simulation engines).The differences in the process of optimization can be then described as follows: • Using the standard approach, the algorithm solves each of the assignments one after another and the total time is ap.equal to the sum of times needed to finish the individual assignments.• PAGAN takes on the first three assignments and solves them simultaneously.However, the algorithm needs to wait for all three assignments to be done before proceeding to the other group of assignments.• In PAGAN-II, the individual engines work independently and once an engine is not busy it immediately takes on another assignment.

216 3 Performance assessment
To assess the performance of the newly developed optimization interface, a series of optimization studies were performed.First, the interface was used for optimization of natural gas liquefaction process to address the possibilities brought by this invention, and second, a detailed performance study was realized to better understand the algorithm's behavior.

C3MR LNG process
Since global energy requirements are expected to rise by 56% between 2010 and 2040 [46], and the development of renewable sources of energy is still under development, a search for a suitable transition fuel is underway.Natural gas (NG) is generally considered as the cleanest traditional fuel with an expected rise of 40% between 2017 and 2040 [35].Due to its suitability for long-distance transport, the liquefied natural gas (LNG) has become an important trade commodity.However, the energy requirements of its production are substantial.Therefore, dozens of publications arose in the last decade aiming at the optimization of the LNG process [35,47].
The propane-precooled mixed-refrigerant (C3MR) process is the most prevalent liquefaction technology used today and produces more LNG than any other process in the world [28].It consists of two working cycles: the propane cycle, where the NG is cooled to ap. -35 °C and the mixed refrigerant (MR) is partially condensed by a sequential partial evaporation of compressed propane; and the mixed-refrigerant cycle, where the NG is subcooled and liquefied in a coil-wound heat exchanger due to the expansion of the MR.It is a relatively simple but highly non-linear process which makes it a great adept for studying simulationbased optimization.Moreover, the composition of NG varies greatly depending on its source.Nitrogen content in the NG can range from less than 1 mol.% to more than 20 mol.%.This highly affects the necessary pressure levels and composition of the MR, which needs to be considered.In this work, a C3MR LNG unit producing 5 MTPA LNG while processing 12 different feedstocks (Table 1) was simulated in Aspen Plus V12 and optimized by the PAGAN-II algorithm.Process parameters and assumptions were adopted from Park et al. [48].Peng-Robinson equation of state was selected for this study as it is the thermodynamic model of choice in virtually every LNG study [35,47,48].The basic layout of the technology is displayed in Fig. 4.

Optimization using genetic algorithm
As a demonstration, the C3MR LNG process encompassing all 12 feedstock alternatives was optimized by the proposed optimization interface using 200 individuals and 200 generations.The selected objective was to achieve minimal specific energy consumption, SEC, i.e., the objective function was stated as follows: ( ) where W is cumulative duty of the compressors, kW, and LNG m is mass flow rate of produced LNG, kg s -1 .
In the objective function ( 1), X denotes the matrix of decision variables.A total of 19 decision variables were optimized, based on [48,59] and our previous work [28].The decision variables as well as their ranges are listed in Table 2.
To achieve reasonable results, several constraints must be satisfied.Most of them can be presented as upper and lower bounds of the variables as presented in Table 2. Additionally, the temperature of the condensed propane (C3-00) cannot exceed 39 °C.And, finally, an individual can be assumed relevant only if the simulation converged.Hence, these constraints affect the final value of the objective function in the form of a penalty function: where ( ) ( )

Performance test
To assess the performance of PAGAN-II and to compare its capabilities with PAGAN and the traditional approach, four performance tests were realized.A 64-bit desktop computer with an AMD Ryzen 9 3900X 3.80 GHz 12-core processor and a 48 GB RAM was used for the analysis.The abovementioned C3MR LNG process was used as a model unit for optimization.The performance test was realized for various numbers of individuals and generations as listed in Table 3.For the sake of reproducibility, each optimization run was started with identical initial population.For a deeper insight into the optimization process, the optimization times for each generation were also measured during Test 1 and visualized in Fig. 6 and Fig. 7. Fig. 6.Measured optimization times per generation for Test 1 using PAGAN.Fig. 7. Measured optimization times per generation for Test 1 using PAGAN-II.It can be observed in the figures that PAGAN-II is not only faster, but the fluctuations in optimization times per generation are also flattened due to the parallel nature of the algorithm which diminishes the effect of non-converged individuals.However, it can be seen that the optimization time gradually increases with each new generation.This phenomenon can be attributed to the fact that continuous running of any software gradually consumes more RAM and overwhelms the processor cache memory which results in increased runtime.For a more representative illustration, measured optimization times of generations without nonconverged individuals are plotted in Fig. 8.The slopes of trendlines in the figure show that increasing the number of parallel simulations diminishes the effect of gradual computation slow-down and also the fact that PAGAN-II is more effective in doing so compared to PAGAN.
Fig. 8. Measured optimization times per generation for Test 1 comprising solely fully-converged populations.Legend: triangles -standard approach, circles -PAGAN, squares -PAGAN-II.To battle the gradual computation slow-down a reinitialization concept was employed.The idea is simplethe simulation engine is restarted every pre-set number of generations.This concept was tested in Test 2 where the number of generations was increased to 50.Results of Test 2 are visualized in Fig. 9.In the figure there can be observed that the reinitialization of simulation environment greatly decreases the optimization time whereas it is restored approximately to the times reached at the beginning of the optimization.Total optimization times with different numbers of parallel simulations and differences caused by employing the reinitialization concept are displayed in Fig. 10.In this case one needs to define the base for assessing the relative computation rate because the reinitialization concept also accelerates the optimization using standard approach by a factor of ap.1.5.Hence, three sets of different relative rates can be obtained.The most important, however, is the relative rate which puts in comparison the optimization time with and without using the reinitialization concept (orange line).In Tests 3 and 4, the number of individuals and generations were set to values (Table 3) which are closer to 330 commonly used values in optimization studies.Due to time reasons, the number of tests was reduced so that 331 only optimizations using 1, 2, 4, 8, and 12 parallel simulations were assessed.Fig. 11 and Fig. 12 13 and Fig. 14 document a decrease in relative computation rate between 8 and 12 parallel simulations.To better understand the PAGAN-II algorithm and to explain this phenomenon, a mathematical model mimicking the algorithm's behavior was created and thoroughly studied.The mathematical model is provided in the Supplementary material.Based on the study of the measured algorithm performance, the following assumptions were applied: • simulation time for each individual is linearly dependent on the elapsed optimization time (to account for the gradual slow-down), • a quadratic relationship exists between the intercept of the simulation time and the number of parallel simulations (i.e., increasing the number of parallel simulations lengthens the base simulation time), • read-and-write time of the algorithm is linearly dependent on the elapsed optimization time (i.e., Matlab performance is also affected over time), • reinitialization of the simulation environment resets the simulation time for individuals but does not affect the read-and-write time of the algorithm (i.e., Aspen Plus simulations are reinitialized but the Matlab algorithm is not), • converged and unconverged individuals have different simulation times, • the number of unconverged individuals in a model generation is equal to the number of unconverged individuals in the respective measured generation, • exact individuals considered unconverged are randomly chosen.These assumptions result in the following set of equations: / 34 were fitted to the measured data using the least-squares method.
Resulting parameters of the model are summarized in Table 4.The model was verified against the measured times per generation as well as the total optimization time.Results of the verification for Test 3 are shown in Fig. 15 and summarized in Table 5 and in Fig. 16.It needs to be reminded here that, even though the mathematical model is generally applicable to any problem, the actual values of parameters 1 10 XX − depend on the actual optimization problem and the used hardware.15.The model's results deviate by no more than 2% from the measured data for the majority of cases and up to 5% in some specific cases where the deviation can be attributed to the uncertainty of measurement rather than the model's precision.The verified model was subsequently used for a deeper study of the algorithm's behavior.Activity of the individual simulation instances are displayed in Table 6."White spaces" in the plots enclosed in the table represent the time when the respective simulation cores are idle, i.e., they are not running a simulation, and they are waiting for another individual to be sent to the simulator.This "waiting time" is the time which is necessary for the algorithm to read the results of the simulation, to write them into an internal variable, and to send new parameters to the simulator, i.e., the read-and-write time.It is evident that the algorithm uses 8 parallel simulations more effectively when compared to 12. Numerically speaking, when using 8 parallel simulations, the simulation engine is active for 80-85% of the total optimization time, whereas some simulations are active for only 77% of the total time when using 12 parallel simulations.Therefore, the algorithm spends a considerable amount of time just "waiting" when using 12 parallel simulations which results in a longer optimization time.

Performance case studies
The performance test unveiled a significant impact of the read-and-write time on the overall performance of the algorithm.However, it was assumed that this impact is dependent on the base simulation time.To test this assumption, three case studies with various simulation times were conducted: using the conditions of Test 3, using the conditions of Test 4, and employing 200 individuals and 200 generations.In the studies, it was assumed that the read-and-write time and its intercept as well as the impact of the number of parallel simulations remain unchanged.Because the distribution of unconverged individuals in the case-study optimization was unknown, a function for the fraction of unconverged individuals (FUI), was constructed as follows: where g stands for the number of a generation.The function was fitted to the average fractions of unconverged individuals in Tests 1 to 4 (Fig. 17 The case study was done with the following simulation times: 1, 3, 5, 10, 15, and 30 seconds.First, the effect of different simulation times was studied for the case of Test 3, i.e., 100 individuals and 50 generations.Results of the case study for Test 3 are displayed in Fig. 18 and Fig. 19.Fig. 18 shows that the base simulation time has a significant effect on the total optimization time achieved by the algorithm.This effect is better observable from the values of relative computation rates (Fig. 19) where it can be seen that, for fast simulations, i.e., with a simulation time of less than 5 seconds, the relative computation rate reaches a maximum.Therefore, in such cases, using additional parallel simulations is not beneficial.Nonetheless, the PAGAN-II algorithm still achieves over 300% increase in the optimization rate.However, it can be observed that using additional parallel simulations is highly beneficial in the case of demanding simulations, i.e., with a simulation time over 10 seconds, where almost 1300% increase in the computation speed can be achieved.Relative computation rates for various base simulation times for Test 3. Similar results can be observed for the case of Test 4, i.e., optimization using 100 individuals over 100 generations (Fig. 20).However, the situation slightly changes when more individuals and generations are added.In Fig. 21., it can be observed that when using 200 individuals and 200 generations (which is a common practice in optimization using the genetic algorithm), the impact of read-and-write time is lower and using a higher number of parallel simulations becomes beneficial for simulations with a simulation time over approx.4 seconds.Furthermore, significantly higher relative computation rates (up to 2100%) can be achieved in such optimizations.all 12 alternatives, the model's prediction was used to choose the optimal number of parallel simulations.In Fig. 22., it can be observed that an optimization of 12 process alternatives with 200 individuals and generations would take approx.69 days (or more than 2 months) of non-stop computation if the standard approach was used.This time was drastically reduced to less than one week (approx.7 days) with the use of PAGAN-II with 12 parallel simulations.It should be noted that the model prediction of the optimization time using the standard approach assumes reinitialization of the simulation environment and, hence, the "true" standard approach would require significantly longer time than 2 months.7. It can be observed that a considerable decrease in specific energy consumption has been achieved with all types of feedstocks.Depending on the actual feedstock, the SEC was decreased in the range of 15 to 30%.Regarding the process parameters, pressure ratios in the mixed-refrigerant compressor, outlet temperatures of the main coil-wound heat exchanger, and the content of nitrogen in the mixed refrigerant have proven to be the most important.

Study limitations and further potential
Time-related challenges of simulation-based optimization are well documented in literature as well as in the presented study.Various authors battled these challenges differently, as summarized in Table 8.While most studies utilize artificial neural networks or surrogate modeling, some authors proposed their modifications of the original algorithm to improve its efficiency.However, it is virtually impossible to quantify the impact of their modifications and/or approaches on the overall optimization time and to compare those results with the results of this study.Up to date, there is, to our best knowledge, no comprehensive interface which would utilize the parallelization concept apart from the Adv:PO interface proposed by Ernst et al. [5] and used by Johannsen et al. [41].However, even this interface relies mostly on a modified algorithm and the mentioned parallelization concept is not discussed in detail.Therefore, we consider PAGAN-II as a novel and unique approach in the field of simulation-based optimization.
Results of the performance assessment and the subsequent case studies proved the potential of PAGAN-II as a generally applicable optimization tool.Even though the exact results of the performance test are heavily dependent on the used hardware, it can be declared that the algorithm's internal behavior is not.Therefore, any future user would find out that: • PAGAN-II is especially useful when dealing with demanding simulation problems.
• Effectivity of PAGAN-II decreases when dealing with "fast" simulations, although significant increase in relative computation rate can be achieved, nonetheless.It should be noted that the presented results were achieved by a desktop computer and that the current version of PAGAN-II utilizes the original GA/NSGA-II algorithms.However, since the interface is highly modifiable, more efficient versions of the algorithm and/or other optimization algorithms can be easily implemented which might lead to another acceleration of the optimization rate.Furthermore, an option to utilize clustered CPU systems (servers) exist which could potentially speed up the computation.Hence, the proposed interface has a considerable potential in both academic and industrial applications.Table 8 Approach of different authors to the simulation-based optimization using GA/NSGA-II.

Fig. 3 .
Fig. 3. Demonstration of optimization process for standard approach, PAGAN and PAGAN-II.Legend: black -start of a simulation, green -end of a simulation.

Fig. 5 .
Fig. 5. Average optimization times and average relative computation rates for Test 1.


are the simulation times of the i-th converged and unconverged individuals, s, respectively, , CU bb  are the base times of converged and unconverged simulations, s, respectively, / RW i  is the read-andwrite time of the i-th individual, s, N is the number of parallel simulations, e  is the elapsed optimization time, s, and 1 10 XX − are parameters of the mathematical model.Parameters 1 10 XX −

Fig. 16 .
Fig. 16.Comparison of measured and model-provided total optimization time (log scale).Legend: full bars -measured data; lined bars -model data.Results of the verification show that the mathematical model can mimic the algorithm's behavior satisfactorily as demonstrated in Fig.15.The model's results deviate by no more than 2% from the measured data for the majority of cases and up to 5% in some specific cases where the deviation can be attributed to the uncertainty of measurement rather than the model's precision.The verified model was subsequently used for a deeper study of the algorithm's behavior.Activity of the individual simulation instances are displayed in Table6."White spaces" in the plots enclosed in the table represent the time when the respective simulation cores are idle, i.e., they are not running a simulation, and they are waiting for another individual to be sent to the simulator.This "waiting time" is the time which is necessary for the algorithm to read the results of the simulation, to write them into an internal variable, and to send new parameters to the simulator, i.e., the read-and-write time.It is evident that the algorithm uses 8 parallel simulations more effectively when compared to 12. Numerically speaking, when using 8 parallel simulations, the simulation engine is active for 80-85% of the total optimization time, whereas some simulations are active for only 77% of the total time when using 12 parallel simulations.Therefore, the algorithm spends a considerable amount of time just "waiting" when using 12 parallel simulations which results in a longer optimization time.

Fig. 18 .
Fig. 18.Optimization times for various base simulation times for Test 3. Note: Numbers in legends of Figs.18 to 21 represent base simulation times (in seconds).

Fig. 19 .
Fig. 19.Relative computation rates for various base simulation times for Test 3.

Fig. 20 .Fig. 21 .
Fig. 20.Relative computation rates for various base simulation times for Test 4. Fig. 21.Relative computation rates for various base simulation times for 200 individuals and 200 generations.4.4 Optimization of the C3MR LNG processThe proposed algorithm was utilized in optimization of a 5 MTPA C3MR LNG process with 12 different feedstocks.200 individuals and 200 generations were used in the optimization.Prior to the optimization of

Fig. 22 .
Fig. 22. Model prediction of the necessary time for optimization of the C3MR LNG process with 12various feedstocks and the associated relative computation rate.Finally, results of the optimization are summarized in Table7.It can be observed that a considerable decrease in specific energy consumption has been achieved with all types of feedstocks.Depending on the actual feedstock, the SEC was decreased in the range of 15 to 30%.Regarding the process parameters, pressure ratios in the mixed-refrigerant compressor, outlet temperatures of the main coil-wound heat exchanger, and the content of nitrogen in the mixed refrigerant have proven to be the most important.

Table 1
Different molar compositions of natural gas published in various studies.

Table 3
Parameters of the performance test.

Table 4
Parameters of the mathematical model.

Table 6
Activity of individual simulation instances over the optimization time and their overall utilization.

Table 7
Optimized parameters of the C3MR process.