In this section, the methods used to implement the gradient decent model ADAM coupled with TSUNAMI are discussed. Also, the challenge problems used to test if ADAM is a suitable optimization method for nuclear processes are introduced.

## Algorithm

The gradient descent model used in this analysis is the ADAM (Adaptive moment estimation) method. [7] The derivatives given by TSUNAMI are calculated using, Monte Carlo based methods, making them stochastic by nature. This means that the derivatives informing the gradient descent algorithm are noisy and have an associated uncertainty. The previously used Interior Point Method does not account for uncertainty or noise, meaning the gradient was assumed to be exact. This requires long TSUNAMI run times else the resulting gradients are noisy. The use of ADAM intends to correct this issue due to the built-in accounting for uncertainty within the gradient. This allows for un-converged sensitivities to be used to optimize nuclear systems.

The continuous energy version TSUNAMI-3D [3–6] is used to calculate eigenvalue and reaction sensitivities for constructing the gradients used in the analysis. This version of TSUNAMI-3D provides 2 methods to calculate sensitivity of k-eigenvalue: Iterative Fission Probability (IFP) and Contribution-Linked eigenvalue sensitivity/Uncertainty estimation via Tracklength importance CHaracterization (CLUTCH). The IFP method calculates the adjoint-weighted tally and the importance for future generations, based on neutron population. The CLUTCH method uses an importance function and determines sensitivity through number of fission neutrons created by a collision [10]. The IFP method is used for the optimization of \({k}_{eff}\) because it requires less neutron histories to be used and therefore allows us to use very un-converged sensitivities with fast run times to build the gradient for each step. The sensitivity of the reaction rate is calculated using general perturbation theory through the GEneralized Adjoint Responses in Monte Carlo (GEAR-MC) method which uses both the CLUTCH and IFP methods [11]. This method calculates the generalized importance function as a sum of intergenerational (IFP) and intragenerational (CLUTCH) effects. The sensitivities for both \({k}_{eff}\) and reaction rate are given as functions of the macroscopic cross section of the material. The sensitivity with respect to the macroscopic cross section can be used as a density sensitivity because the macroscopic cross section is the product of material number density and microscopic cross section. In this work, microscopic cross sections and molar mass are assumed to be known for a given material, resulting in the following relationship between macroscopic cross section and the mass density of the material:

$${\Sigma }=N*\sigma =\frac{{N}_{A}}{M}\rho *\sigma$$

where N is the number density, \(\sigma\) is the total microscopic cross section, \({N}_{A}\) is Avogadro’s number, M is the molar mass, and \(\rho\) is the mass density of a material – the physical design parameter that is varied for optimization.

## Challenge problems

Four challenge problems are proposed to test the use of TSUNAMI to build a gradient for the ADAM method. Challenge problems one, two and three mirror the challenge problems developed in the first publication [2]. The set-up is a 55 cm x 55 cm 2-dimensional, un-reflected system. This geometry is then discretized into pixels, with the material being constant within each pixel. The material for each pixel is allowed to be a fixed homogenous mixture of UO2 (at 3% enrichment) and H2O. The density of this mixture in each pixel is varied as parameter for optimization. This mixture is a ratio of 1 \(\text{U}{\text{O}}_{2}\) to 3.4 \({\text{H}}_{2}\text{O}\). The SCALE material card is reported in the additional information section.

This problem is purposefully designed to have a known optimal \({k}_{eff}\) of 1 with a perfectly circular geometry. The pixelated version of the optimal solution has a \({k}_{eff}\) slightly less than 1, depending on the spatial resolution. These problems aim to test ADAM as nuclear system optimization algorithm, test if a constraint can be implemented into the gradient and determine if un-converged sensitivities can be used in gradient decent optimization.

In the first challenge problem, the prism is broken into an 11 x 11 grid of 5 cm x 5 cm pixels that each have a unique density. The density in each pixel is expressed by \(f\left(x\right)\), seen below. The Sigmoid function,\(f\left(x\right)\) is used to allow *x* to be in the range of negative infinity to positive infinity while restricting the density to remain in the range of zero to one. For this challenge problem, zero represents void and unity represents the density of a homogenized fuel pin. This challenge problem optimizes the density of each pixel to maximize \({k}_{eff}\) with a constraint on the total mass of the system. The amount of total mass of the system is restricted to the mass of 61 pixels of the nominal density. This value was chosen as 61 because the system can become critical with 61 pixels (50.4% full) of the material in a cylindrical configuration. This problem aims to test the ability of ADAM to maximize the performance of a nuclear system while the variables are constrained.

$$f\left(x\right)=\frac{1}{1+{e}^{-x}}$$

To enforce the constraint on the \({k}_{eff}\) optimization problem, the objective function is changed such that the score is lowered if the mass goes above 61 by an exponential penalty term. The hyperparameters of this exponential penalty term need to be optimized such that they force the mass to the desired constraint. The equation used for the objective function and gradient for this problem can be seen below. The function, \(O\left(\stackrel{⃑}{x}\right)\), is the optimization function and \(\frac{dO}{dx}\) is the gradient used within the ADAM algorithm. The variables \(r\) and \(v\) are parameters that allow the penalty function to be tuned such that it only takes effect once the constraint is exceeded and \(S\left(x\right)\) represents the sensitivities calculated by TSUNAMI.

$$O\left(\stackrel{⃑}{x}\right)=r*{e}^{v*\left({\varSigma }_{i=1}^{121}\left(f\left({x}_{i}\right)\right)-61\right)}-{k}_{eff}\left(\stackrel{⃑}{x}\right)$$

$$\frac{dO}{d{x}_{i}}=r*{e}^{v*\left({\varSigma }_{i=1}^{121}\left(f\left({x}_{i}\right)\right)-61\right)}*\left(v\right)*\frac{{e}^{-{x}_{i}}}{({1+{e}^{-{x}_{i}})}^{2}}-S\left({x}_{i}\right)$$

The second challenge problem uses the same geometry and density variable as problem one but aims to minimize mass with a constraint on \({k}_{eff}\). The constraint is set such that \({k}_{eff}\) must be greater than unity. This problem demonstrates the use of TSUNAMI sensitivities in the constraint function.

A new objective function was developed to minimize the mass of the system while constraining the \({k}_{eff}\) of the system. The new equation used can be seen below where \(O\left(x\right)\) is the objective function, \(\frac{dO}{dx}\) is the gradient used within ADAM, *r* is a tuning parameter for the \({k}_{eff}\) constraint, \(\stackrel{⃑}{x}\) is the set of one hundred twenty-one parameters for mass, and \(S\left(x\right)\) refer to the sensitivities pulled from TSUNAMI.

$$O\left(\stackrel{⃑}{x}\right)={e}^{v\left(1-{k}_{eff}\left(\stackrel{⃑}{x}\right)\right)}-{\sum }_{i=1}^{121}\left(\frac{1}{1+{e}^{-{x}_{i}}}\right)$$

$$\frac{dO}{d{x}_{i}}=-(r\bullet {e}^{v\left(1-{k}_{eff}\left(\overrightarrow{x}\right)\right)}vS\left({x}_{i}\right)+\frac{{e}^{-{x}_{i}}}{({1+{e}^{-{x}_{i}})}^{2}})$$

The third challenge problem is an expansion of the 11 x 11 geometry. This problem mimics challenge problem one’s geometry with a 44 x 44 pixelation where the outer dimensions are still 55 cm x 55 cm, and the material is varied in a similar way. The same material is used for this problem as the previous problems. The number of full cells is changed proportionally to ensure the same amount of material is used. The new number of cells used as the mass constraint is (\(61\bullet \frac{{44}^{2}}{{11}^{2}})=976\). This is the only change to the objective function and derivative used in the first challenge problem, where the 61 is replaced with 976. This problem aims to show that when we expand the number of variables within the system, the ADAM algorithm can still converge. The finer grid also gives ADAM more geometric freedom to form a better resolved solution. It should also be noted that the sensitivities from TSUNAMI will have a larger relative uncertainty per pixel due to the finer discretization. Therefore, challenge problem three will show how an increase in uncertainty and noise in the derivatives will not affect ADAM’s ability to find a solution.

The fourth challenge problem is an 80 cm slab geometry reflected on two axes, effectively creating a 1-dimensional problem. This slab is then divided into 8 equal regions in the non-reflected direction. The geometry is also reflected on the face of region one, doubling the slab size with material symmetry. The slab is made of the same material as the previous problem. This geometry was chosen to represent the axial flux shape of a 1-dimensional system. The objective of this challenge problem is to flatten fission reaction rate profile across all cells by changing the density of the material in each region. This problem aims to test the ability of the GPT method of reaction rate sensitivity as a gradient for optimization. Below are the equations used for the objective function and derivatives used for this challenge problem, where *i* and *j* refer to the discretization locations, *RR* refers to reaction rate and *S(x**i**)* is the sensitivity of the reaction rate ratio at location *i* over location *j.*

$$O\left(\stackrel{⃑}{x}\right)={\sum }_{i=1}^{8}{\sum }_{j=1}^{8}{\left(1-\frac{{RR}_{i}\left(\overrightarrow{x}\right)}{{RR}_{j}\left(\overrightarrow{x}\right)}\right)}^{2}$$

$$\frac{dO}{d{x}_{i}}=\sum _{j=1}^{8}\left[\frac{2}{{RR}_{j}}\bullet \left(\frac{{RR}_{i}}{{RR}_{j}}-1\right)\bullet S\left({x}_{i}\right)\right]$$

The ADAM [7] method name stands for adaptive moment estimate. ADAM combines two well-established gradient decent methods: Adagrad [8] and RMSProp [9]. ADAM leverages Adagrad’s ability to use sparce gradients and RMSProp’s utility with on-line and non-stationary settings. ADAM is designed to: ensure step size magnitude to not be changed by gradient changes, work well with sparce gradients, and have natural step size annealing. The ADAM algorithm can be seen below, where \({\beta }_{1}\), \({\beta }_{2}\), \(\epsilon\) and \(\alpha\) are hyperparameters used to tune performance, \(t\) is the current step,\(x\) is the list of variables being optimized, \({g}_{t}\) is the gradient vector, \(m\) is the first moment estimate, \(v\) is the raw second moment estimate, \(\widehat{m}\) is the bias-corrected first moment estimate, and \(\widehat{v}\) is the bias-corrected second moment estimate. The variables \({\beta }_{1}\) and \({\beta }_{2}\) are limited on [0,1) and determine the momentum of the algorithm, \(\alpha\) adjusts the step size of the algorithm, and \(\epsilon\) is used to ensure the algorithm is not dividing by zero. Momentum, in this case, refers to a gradient’s likelihood to stay on its current path.

$${m}_{t}={\beta }_{1}\bullet {m}_{t-1}+\left(1-{\beta }_{1}\right)\bullet {g}_{t}$$

$${v}_{t}={\beta }_{2}\bullet {v}_{t-1}+\left(1-{\beta }_{2}\right)\bullet {g}_{t}^{2}$$

$${\widehat{m}}_{t}=\frac{{m}_{t}}{(1-{\beta }_{1}^{t})}$$

$${\widehat{v}}_{t}=\frac{{v}_{t}}{(1-{\beta }_{2}^{t})}$$

$${x}_{t}={x}_{t-1}-\frac{\alpha \bullet {\widehat{m}}_{t}}{\sqrt{{\widehat{v}}_{t}}+\epsilon }$$

The implementation of ADAM, for the challenge problems solved in this article, utilizes sensitivities from TSUNAMI as the gradient directly. TSUNAMI outputs sensitivities in two ways: material-based and element-based. For the challenge problems chosen, material sensitivities are used, because the problems are not optimizing the ratio of the material. They are optimizing the location of the material. The TSUNAMI run made at each step are constructed to use very little computation time to test if the algorithm works with an unconverged gradient. Each TSUNAMI run uses 10 skipped generations, 5 latent generations, 10 active generations and 10,000 neutrons per generation, which allows us to use these short calculation, unconverged gradients.