Optimal Designs for Nonlinear Mixed-effects Models Using Competitive Swarm Optimizer with Mutated Agents

Nature-inspired meta-heuristic algorithms are increasingly used in many disciplines to tackle challenging optimization problems. Our focus is to apply a newly proposed nature-inspired meta-heuristics algorithm called CSO-MA to solve challenging design problems in biosciences and demonstrate its flexibility to find various types of optimal approximate or exact designs for nonlinear mixed models with one or several interacting factors and with or without random effects. We show that CSO-MA is efficient and can frequently outperform other algorithms either in terms of speed or accuracy. The algorithm, like other meta-heuristic algorithms, is free of technical assumptions and flexible in that it can incorporate cost structure or multiple user-specified constraints, such as, a fixed number of measurements per subject in a longitudinal study. When possible, we confirm some of the CSO-MA generated designs are optimal with theory by developing theory-based innovative plots. Our applications include searching optimal designs to estimate (i) parameters in mixed nonlinear models with correlated random effects, (ii) a function of parameters for a count model in a dose combination study, and (iii) parameters in a HIV dynamic model. In each case, we show the advantages of using a meta-heuristic approach to solve the optimization problem, and the added benefits of the generated designs.


Introduction
Meta-heuristics, and in particular, nature-inspired meta-heuristics, is increasingly used to tackle complex optimization problems in the real world that traditional algorithms cannot solve.They include high dimensional constrained nonlinear optimization problems, multiple-objective optimal design problems, or problems that do not have an explicit objective function to optimize.Meta-heuristics is now frequently used across disciplines to solve hard to optimize problems.For example, it is used widely in artificial intelligence [1] and during the pandemic, meta-heuristics was implemented in different ways to detect, track and predict COVID-19 infection rates.For example, [2] used meta-heuristics to develop cheap and fast diagnostic tools for detecting people infected with COVID19 to reduce the pressure on healthcare systems.Another example is [3], who created a discrete gaining-sharing knowledge-based meta-heuristic algorithm and derived an optimal disinfection scheduling strategies for cleaning public places during the pandemic.[4] reviewed numerous applications of meta-heuristics to study COVID19.Meta-heuristics has also made significant contributions in using medical images to classify various diseases and predict disease progression in lung cancer, cardiovascular diseases and renal failure [5,6].[7] discussed concrete applications of meta-heuristics to tackle healthcare and drug discovery issues.Most recently, [8] discussed use of evolutionary algorithms to identify gender influence in rheumatic diseases.
Meta-heuristic algorithms include genetic algorithm (GA), differential evolution (DE) and particle swarm optimization (PSO), among many others.Each of these algorithms has been tested successfully in many real applications.The more popular ones like those just named have many variants, which improved the original versions in different ways.For example, the enhanced algorithms may converge faster, make the original algorithm less prone to premature convergence, or has a greater chance to extricate itself from a local optimum.An example of a swarm based meta-heuristic algorithm is Competitive Swarm Optimizer (CSO) and one of its variants is CSO-MA with mutated agents (CSO-MA).Hybridized algorithms that creatively combine suitable algorithms, meta-heuristic or not, can also markedly increase the performance of a meta-heuristic algorithm [9,10].A guiding principle is that the hybridized algorithm should perform better than either of the algorithms used in the hybridization.
The aim of this paper is to apply nature-inspired meta-heuristics to tackle a variety of challenging optimal design problems in the biosciences.We focus on one such algorithm proposed by one of the authors, called Competitive Swarm Optimizer with Mutating Agents (CSO-MA) and demonstrate its flexibility for solving a variety of more complex or high dimensional design problems.When it is possible to theoretically verify the optimality of a design, we also show how it can be verified using new innovative plots based on an equivalence theorem when the design space consists of all subsets of T positive integers.In the supporting material, we review CSO and CSO-MA and provide online Matlab and Python packages for running CSO-MA, along with web-based tools and references to monographs on meta-heuristics.We also include sme technical details for the applications.
In Section 2, we describe our statistical setup, approximate designs, optimality criteria and how to find and confirm optimality of an approximate design.The statistical models are described, including those for designing a longitudinal study.Section 3 reports a variety of optimal designs found from CSO-MA for different types of model and design criteria.They include locally D and c-optimal designs for different types of mixed nonlinear models, Bayesian optimal designs for estimating parameters in a HIV dynamics model and locally D-optimal designs for a high dimensional generalized linear models.Section 4 summarizes the key findings in the paper and emphasizes the broad applicability and usefulness of meta-heuristics for tackling hard-to-optimize problems, especially when all mathematical programming methods are not valid.

Statistical Models and Fundamentals of Optimal Design Construction
We first present statistical models and fundamentals in optimal design construction.Details can be found in design monographs, such as, [11] and [12].Suppose the statistical model has the form E(y) = f (x, β), where y is the response, E(y) is the mean response that depends on a vector of explanatory factors x via a known nonlinear function f (x, β), apart from the unknown model parameters β ∈ R p .The responses may be uncorrelated or correlated and they need not necessarily be normally distributed.Given a design criterion, design issues concern how to optimally select values of the explanatory factors x from a pre-selected compact design space Ω to observe the response.
Designs can be exact or approximate.The main difference between the two types of designs is that the problem of finding an optimal approximate design is a convex optimization problem.Convex analysis theory then be used to construct algorithms for finding optimal approximate designs and confirming the optimality of an approximate design.In contrast, finding any optimal exact designs for a model requires solving a number-theoretical problem unless the regression model is extremely simple.There is no general theory for finding and confirming the optimality of an exact design.For this reason, we discuss mainly approximate designs but also provide examples of efficient exact optimal designs found by our algorithm.
Given an optimality criterion, approximate designs assume the given number of observations N is large and we determine the number n of design points in the optimal design, where these design points x ′ i are and the proportions w i of observations to be taken at the design points.The design problem is to find optimal values of n and the pairs (x i , w i ) subject to each proportion is positive, w 1 +. ..+w n = 1 and each x i ∈ Ω.Some approximate designs η with their weights and design points are displayed in Table 1.
The worth of a design η is measured its Fisher information matrix M(η, β) [13] for the model.When random effects are present, calculating the information matrix can be challenging unless the models are relatively simple.In practice, the matrix is often approximated before use.Design efficiency is mostly defined by the ratio of the design criterion values and that of the optimal design, with the exception of Defficiency defined by the p th root of the ratio of the two determinants.This is done to maintain the interpretation of of the design efficiency, scaled between 0 and 1 and then multiplied by 100.If a design has 50% efficiency, then the design has to be replicated twice for it to do as well as the optimum.

Optimality Criteria and Equivalence Theorems
In practice, design criteria are formulated as either concave or convex functions of the information matrix and common ones are D-and c-optimality.For nonlinear models, their information matrices depend on the unknown parameters which we want to estimate.To this end, we assume nominal parameter values of the parameters are available from either a pilot or similar [12,14].Because the optimal designs depend on the nominal values, they are only locally optimal.For instance, if β 0 is the vector of nominal values for β, the locally D-optimal design maximizes log{det(M(η, β 0 ))} and the c-optimal design maximizes −c T M −1 (η, β 0 )c where c is an user-specified column vector if the aim is to estimate c T β.In either case, the optimization is over all designs on the given compact design space.
The D and c-optimality criteria are both concave functions and so their directional derivatives always exist.We then use them to evaluate the D-optimality of a design η via a sensitivity function defined by where p is the number of parameters in the model.This leads to the general equivalence theorems discussed in [13] for checking optimality of a design among all design on the given design space Ω.For D-optimality, it asserts that the design η ⋆ is locally D-optimal if and only if S D (η ⋆ , x) ≤ 0, ∀x ∈ Ω with equality at the design points of η ⋆ .Further details are available in [15], where equivalence theorems are derived for several convex (equivalently, concave) criteria.For example, the sensitivity function S c (η, x) for local c-optimality of a design η is and η ⋆ is locally c-optimal if and only if S c (η ⋆ , x) ≤ 0, ∀x ∈ Ω with equality at all design points of η ⋆ .
In what is to follow, we provide 7 applications of CSO-MA to demonstrate its flexibility and find various types of optimal designs for both linear and nonlinear with discrete or continuous outcome.Unlike the majority of work in this area, almost all models have random effects.Our designs can be exact and approximate designs and they can be locally or Bayesian optimal designs.The first two applications incorporate cost structure into the design construction and differs from some of the latest research in this area that often assumes a fixed effects model [16][17][18].The last application concerns finding a Bayesian optimal design for a nonlinear model for studying HIV dynamics.There are recent packages/tools, such as that from [19], for finding optimal designs for nonlinear mixed models, but those are models specific and not amenable for finding some of the optimal designs reported herein.
In the implementation stage, we generally used φ = 0.1 and a swarm size of 128 (except for application 3 and 6 we are using 256 and 16, respectively) in CSO-MA.If and when these inputs do not work as well, like other meta-heuristics, we change their values a bit.We assume nominal values are available for some of our applications; otherwise, we simulate and randomly choose a number of parameter sets for each model and then search for the optimal designs.The simulated nominal values for each model are shown, along with the CSO-MA generated design.All computations are done on a Windows 10 PC with a 3.20GHz Intel i7-8700 CPU, 32GB DDR4 2666MHz memory and 512G SSD storage.The programming language is MATLAB 2020a and Python 3.9.13.The codes are available from the first author.

Optimal Design Problems
In this section, we apply a state-of-art meta-heuristic algorithm to find a variety of new optimal designs under various complex situations.The rationale and descriptions of the selected meta-heuristic algorithms are given in the supporting material.

Experimental Costs
Costs are incurred when a measurement is made and may depend at which time point the observation is taken.Suppose the cost of taking an observation at time η k of the design η is given by a known function c(η k ).We normalized the information matrix by the cost function and work with the normalized information matrix M * (η): The advantages of incorporating cost into the design have been discussed in [20] and a common cost function is c(η k ) = α 1 + α 2 n k , where n k is the number of time points/measurements taken at η k , and α 1 , α 2 are user defined.The normalized information matrix is reminiscent of the common case seen in optimal design literature when errors are heteroscedastic and the inverse of the error variance at a point is represented by the efficiency function.When the cost structure is linear, the optimal design depends on the ratio of the two parameters r = α 1 /α 2 and not on the values of α 1 and α 2 .In the following subsections, we apply CSO-MA to generate designs for estimating parameters in various mixed models and some include cost considerations.All generated approximate designs have been verified to be optimal and for space consideration, we only show some of their sensitivity plots that confirm optimality of the CSO-MA generated designs.

Fractional Polynomial Regression Models
Suppose we wish to monitor the pulmonary function of the lungs of patients periodically in a clinical trial.A common measure is forced vital capacity, which is a continuous variable.Polynomials are traditionally used to model the outcome over time but increasingly, fractional polynomial models are used because they are more versatile and frequently provide better fits to the mean response over time [21].Fractional polynomials are polynomials that allowed positive and negative fractions as powers for the monoials.[22] proposed such models and suggested that it is adequate to select powers of monoials from the set S = {−2, −1, −0.5, 0, 0.5, 1, 2, 3}.For our problem, we assume observations can be chosen from a discrete set T = {1, 2, • • • , T max }, where T max is given and we want to determine the number and sampling time points for each subject in an optimal way.
Consider the linear fractional polynomial model with a subject random effect: where we assume b i ∼ N (0, σ 2 b ), e ij ∼ N (0, σ 2 e ).Here t ij refers to the time point of the j-th visit by individual i, b i is the random intercept term assumed to have a normal distribution with zero mean and variance σ 2 b , e ij is an error term following a normal distribution with zero mean and variance σ 2 e .All random effects and error terms are assumed to be mutually independent of one another.Our interest is to find the locally D-optimal design to estimate the model parameters β = (β 0 , • • • , β 7 ) T by selecting the optimal subsets of time points from T among all possible subsets from the pre-selected discretized design space.
Suppose η is a design with n design points η 1 , • • • , η n and each design point η k has n k time points (t k1 , • • • , t kn k ).Let the weight for each subset of design points from T for the design η k be w j subject to Application 1: Suppose SR = 2, r = 0.5, and T max = 10.Table 1 shows that CSO-MA took 2.0 seconds of CPU time to find design η 1 with the D-criterion value of -33.200.This generated design has two groups of subjects, with about 83% of subjects having six measurements at times 1, 2, 3, 5, 8, 10 and 17% of subjects having six measurements at times 1, 2, 3, 6, 8, 10.This design guarantees that statistical inference for all the fixed parameters is estimated with maximum efficiency.
Application 2: Suppose SR = 2, r = 5, and T max = 10.Table 1 shows the CSO-MA generated design is η 2 , which can be interpreted the same way as in the first case.It has a D-criterion value of -36.141 and CSO-MA took 2.0 seconds to find the design.
. This means that the first choice requires the subjects be observed once at the end of the first 1 hour, and the last choice requires that the subject be observed every hour for 10 hours.The optimal approximate design selects what percentage of subjects be observed at different sets of time points and what the time points within each set.To construct the plot, we first order the sets of time points according to their values of the sensitivity function and then plot the function across the ordered 1023 sets of time points.
Figure 1 shows the uninformative plot of the sensitivity function on the left and the plot with properly ordered sets of time points on the right.The latter confirms the optimality of the generated designs for the two examples.We consider the right plot quite innovative since we have not seen one display under such a setting before.

Application 3: Fractional Polynomial Models with Correlated Random Coefficients
We show that CSO-MA can generate optimal designs for models with multiple correlated random effects.Our example is illustrative in that the nominal model parameters and the covariance matrix of the random effects are arbitrarily selected.We assume we want to take four observations for each subject from the time interval t ∈ [1,10].
The model is where The information matrix is derived in a similar manner as in [23].Since the value of σ 2 e does not affect the optimization, we set it equal to 1.We used 256 particles and set φ = 0.1 and the algorithm converged in 10∼20 seconds.
Table 1 displays the locally CSO-MA generated design η 3 under the D-optimality criterion for Application 3. Its criterion value is -60.583.It shows that the total sample of subjects is divided unequally into four subgroups and patients in all the subgroup are observed at four time points, and these time points vary from subgroup to subgroup.For example, we observe from the first row of the design that one subgroup has 34.7% of the total number of subjects and each subject is observed at the time points 1.878, 4.224, 8.297 and 10.000.The other optimal designs are similarly interpreted.

Mixed Logistic Regression Models
Binary outcomes are ubiquitous and a logistic regression model with mixed-effects over time can be used to model the longitudinal data.The probability of a response from individual i at the time point t j may be described by a logistic model with a random intercept term and each time point may be chosen from a given discrete set We first consider a one-factor logistic mixed model.Suppose that the i-th subject has n i measurements at t i1 , • • • , t ini and p ij is the probability that the i-th subject responds to a treatment at time t ij .Treating subject effect b i as random, we obtain The research question is to find optimal time points for each patient to estimate the parameters β 0 and β 1 as accurately as possible.
The log-likelihood function of the above model does not have a closed form and the information matrix thus cannot be derived analytically.One solution is to use a first-order penalized quasi-likelihood (PQL) for approximating the true likelihood function [24,25].By using the PQL method and assuming a n-point design η with each design point η i having n i time points/measurements, the information matrix can be approximated as where T i is the i-th design matrix, J is a n i -dimensional square matrix whose elements are all one's and W i is a diagonal matrix given by Application 4: A Logistic Model with Fractional Polynomials The above model now has fractional polynomial terms in the link function given by where b i ∼ N (0, σ 2 b ).After the information matrix is approximated by the PQL method, CSO-MA searches for the optimal design.For example, we assume the values of the nominal parameters are: σ 2 b = 0.2, β = (1.0,0.2, −3.0, 0.5, −1.2) T , the cost ratio is either r = 6 or r = 0.3 and we want to optimally choose time points t ij for each subject from the set {1, 2, • • • , 6}.Table 1 displays the locally CSO-MA generated D-optimal designs η 4-1 for r = 6 and the D-optimal design η 4-2 for r = 0.3.The CPU time for the first(second) search was 28 (27) seconds and its criterion value is -49.381(-45.876).
These two optimal designs, along with those in Examples 1 and 2, suggest that more groups of subjects and more time points may be needed in the optimal design when the cost ratio in the linear cost function r becomes larger.

c-optimal Design for Negative Binomial Regression Models
To demonstrate the flexibility of our approach, we now design for a count model with mixed effects in a longitudinal study.The negative binomial regression model is a flexible model as it can be used to model over-dispersed or under-dispersed data.The count variable can be the number of new flares in Scleroderma patients or the number of new lesions in patients after a new treatment regimen over time.[26] found an optimal design for a phase I/II clinical trial for treating multiple sclerosis with gadoliniumenhanced lesions as the endpoint.The approach lacks theory and the design was selected among a few candidate designs and based entirely on simulated error rates produced by the various designs.Optimal designs for the Poisson and negative binomial regression models for fixed-effects or for models with a random intercept with one or two additive factors were reported in [27] and, [28], respectively.
Our models here are negative binomial mixed models and unlike previous work, we add a constraint that all subjects are observed at T user-selected number of points.Such a constraint arise in situations when, for example, taking observations are either laborious or expensive, or even risky in pediatric trials where only a limited number of measurements are allowed for the young subjects.In what is to follow, we show CSO-MA can directly accommodate such a constraint without difficulties.The derivations are modeled after [29] and are in the supporting information material for completeness.
As an application, we consider a two-drug trial and denote the combinations of the drug treatments by x i randomly assigned to the i-th subject and x ik is the dose level of drug k.We assume all drug levels have been normalized to [−1, 1].We further assume that once a drug level is determined for a patient, it remains unchanged throughout the trial.For administration purposes, each patient is required to observe T times given the space {1, 2, 3, • • • , T max } and T max is pre-determined.A negative binomial regression model is used to study the effects of the combinations of the drugs on the count outcome.The count outcome may be the number of allergy reactions after each treatment, or the number of new lesions seen after each treatment as in [26].We chose the negative binomial regression model over the commonly-used Poisson regression model because the former is flexible and can capture under or over-dispersion.
Specifically, we have an additive negative binomial mixed model in two variables with a time trend and given by log Here the parameter a is the dispersion factor and if it is positive, it suggests that the data is over-dispersed, which is usually the case for real data.The information matrix of model ( 5) can also be approximated by the PQL method as in the last subsection.
Application 5: c-optimal Approximate Design for a Negative Binomial Model Each subject in the clinical trial receives a combination dose from the two drugs and each subject is observed at only T = 3 out of the T max = 6 possible time points with (one observation per week).The goals are to find an optimal design to determine what combination doses and which three-time points are best for answering the question: are the two drugs equally effective if the same dosage levels are given to the patients?
The CSO-MA generated design η 5 assigns equal number of patients to two dose combinations of the two drugs.Responses from each patient are observed three times, out of the 6 possible time points.The first group received a combination of the extreme doses from the two drugs and the second group received the other set of extreme doses from the two drugs.One group is observed at the first, second and sixth week, and the other group is observed at the second, fifth and sixth week.

Bayesian Optimal Exact Designs for Nonlinear Mixed Models Applied to HIV Dynamics
Finding Bayesian optimal designs for real applications is challenging because (i) the specifications of the nonlinear model and the prior distribution can be problematic, (ii) the design criterion does not have a closed form and consists of multiple integrals, which requires approximations using a random sampling scheme, and (iii) there is a lack of effective algorithms to numerically optimize the design criterion or an utility function under a general setting.This subsection shows that CSO-MA can be effective to search for Bayesian optimal exact designs for estimating model parameters in a mixed-effects nonlinear model for a HIV trial studied by [30].Recall that an exact design is characterized by a userspecified value of N , the total number of observations allowed for the study, and we want to determine the optimal number of sampling times, k, and the optimal number of replicates n i at the i th time point, t i , i = 1, . . ., k such that n 1 + . . .+ n k = N .Optimality is based on the user-specified design criterion.
Frequently, medical experts suggest a few designs that they believe are meaningful for the study.For instance, physicians believe that should be 6 sampling times in 2 hours and pharmacokinetic/pharmacodynamic (PK/PD) considerations may require more observations near the start and end of the sampling period.A few candidate designs are then proposed and for modelling purposes, a statistical model is postulated by the experts in PK/PD.If the goal is to estimate all model parameters in the model for prediction purposes, we compute the determinants of the information matrices for the few designs and implement the 6-point design with the largest determinant.The optimization is over all the pre-selected candidate designs and is not optimized among all 6-point designs.This suggests that even though there were a few pre-selected designs to choose from at the onset, additional options should be worth consideration if there are other designs that are noticeably more efficient and not too dissimilar from the candidate designs.We explore this possibility in this subsection.
Application 6 There are 8 candidate designs for the HIV-dynamics model and each has 16 sampling time points from the time interval [0, 7].The goal is to accurately estimate 3 parameters in the nonlinear mixed effects model and determine which one of the 8 designs is the optimum.The design criterion is to minimize the determinant of the covariance matrix of the 3 estimated parameters.The information matrix does not have a closed form and there are various ways to approximate the information matrix, see for example, [31].The design criterion contains multiple integrals and the criterion has to be numerically optimized.The optimization can be among the 8 candidate designs with 16-points or among all 16-point designs on the interval [0, 7].
[30] used a Bayesian approach to determine which of the eight candidate designs is best.Two prior densities were used and the model used to study plasma concentration under the protease inhibitor monotherapy in the HIV dynamics study was a nonlinear mixed model.The information matrix for each design did not have a closed form and we approximated it using penalized quasi-likelihood methodology, as before.The information matirx of the design with the largest determinant value among the 8 candidate designs was then selected.In our work to follow, we use CSO-MA and find the best Bayesian exact optimal design among all designs with 16 time points.
Let y ij be the plasma concentration from subject i at time t j after the pharmacologic delay of the drug.Using the same notation and model in their paper, which is nonlinear mixed effects model, the natural logarithm of y ij is given by where N p (u, V) is a p-dimensional normal distribution with mean u and covariance V; σ 2 is the noise level, θ i is the random effect vector of subject i, s(θ i , t j ) is the nonlinear effect of subject i at time t j .Further, we assume where µ = (µ 1 , µ 2 , µ 3 ) T = (log V 0 , log c, log δ) T is the population level mean of θ i and Σ is the population level covariance matrix.For subject i, V 0i is the plasma concentration of HIV particles at treatment initiation; c i is the virion clearance rate and δ i is the rate at which the infected CD4 cells die [32].The expression of s(θ i , t j ) is obtained by solving a system of differential equations that describes the transactions among virus particles, target cells, and infected cells [33].[32] proposed a more complicated hierarchical Bayesian model for studying HIV dynamics by allowing prior beliefs to be specified for the variance and covariance matrix as follows: The model is where G(α, β) is the Gamma distribution with shape α and rate β; W(Φ, γ) is the Wishart distribution with scale matrix Φ and degrees of freedom γ.The θ i 's are random effects generated from N 3 (µ, Σ).Other prior distributions may be used.
There are two particularly interesting parameters to estimate are µ 2 = log c and µ 3 = log δ.To incorporate prior information on the parameters, Bayesian design criteria were used to find time points t = (t 1 , • • • , t T ) T that minimize the expectation of posterior variances E[Var(µ 2 |y)] and E[Var(µ 3 |y)], or equivalently, maximize the the utility functions, i.e., −E[Var(µ 2 |y)] and −E[Var(µ 3 |y)].[32] compared eight 16-point candidate designs and their performance is reported in Table 1 of [32].
Finding the Bayesian optimal design is not a trivial problem because we need to draw samples from the posterior distribution, which does not have an explicit form.The distribution of µ|y in this application cannot be obtained analytically and we resort to the MCMC method to approximate the posterior distribution.Noting that where 1.Given an input t and hyper parameters (α, β, η, Λ, Φ, γ), we hierarchically draw random samples for µ, Σ, and then θ and y.
2. Given an observation y, we use formula (7) and apply the M-H algorithm to estimate the posterior variance of µ|y.A multivariate normal distribution is used as the proposal distribution, which is also the proposal distribution used in [32].
After we supplied the values of hyper-parameters of the prior distributions, we also have to supply values for M, N, K, whose definition can be found in the package file.The details of the whole procedure is given in the appendix.
Table 2 displays CSO-MA estimates for E[Var(µ 2 |y)] and E[Var(µ 3 |y)] and they agree with the results in Table 2 of [32].This suggests hat the above three-step algorithm has worked well for this problem.For space considerations, we report CSO-MA results for candidate designs 1-4 only under the same two prior distributions in their paper.Table 3 displays the four CSO-MA generated 16-point Bayesian exact designs under each prior distribution and, as expected, all have smaller criterion values than all the candidate designs (see Table 2).The 4 percentages in parentheses in the last column of Table 3   for the candidate designs 1-4 produced from our implemented 3step algorithm.The corresponding standard deviation (std) of the criterion value averaged over 10 repetitions is also reported.The results from our implemented 3-step algorithm are very close to the results in Han and Chaloner (2004).
π 2 are 26% and 71%, respectively, this suggests Bayesian exact optimal designs are sensitive to specification of the prior densities.Figure 2 displays the design points of the 8 candidate designs and those from the 16-point Bayesian designs found by CSO-MA for estimating E[Var(µ 2 |y)] and E[Var(µ 3 |y)] under prior density π 1 .For space consideration, we omit results when the prior distribution is π 2 or corresponding results for estimating µ 1 .The figure shows that both the 16-point Bayesian optimal designs require one or more observations at the extreme point T = 7 but no observation at the start of the study.They tend to be more concentrated in the interval [2,6] with that of estimating µ 3 more spread out.Because the CSO-MA generated designs optimize the design criterion among all 16-point designs, we can calculate efficiencies of the candidate designs.
The message here is that with the CSO-MA generated designs, physicians can now be made aware of the statistical inefficiencies of the candidate designs and hopefully, the implemented design is an informed choice and represents a compromise between practical needs and statistical efficiency.

High Dimensional Locally D-optimal Design for Generalized Linear Models
In a generalized linear model (GLM) setting, there are usually multiple interacting factors (covariates) because a couple of explanatory factors may not capture the complex structure of the full data adequately.With more factors, there is an increasing number of parameters in the model.For example, a 5-factor GLM with all second order interactions has 16 parameters.Consequently, to find an optimal design to estimate all parameters, the design needs to have at least 16 points.This means we have to solve a constrained optimization problem of dimension 16 × 6 = 96 or more.This subsection shows CSO-MA can find locally D-optimal designs for a high dimensional setting and that CSO-MA tends to outperform other meta-heuristics.Our models are the same to those in [34] so that we can compare CSO-MA results with their results obtained from particle swarm optimization (PSO), CSO and genetic algorithm (GA), which are well known meta-heuristic algorithms.Similar techniques can also be applied n to find locally optimal designs for higher dimensional models with more than 5 interacting factors.For example, [35] and [36], respectively, used differential evolution and quantum PSO, a variant of PSO, to find locally D-optimal designs for logistic models with 10 interacting factors.
Application 7 We assume that we have logistic or Poisson regression models, all with 5 factors and include all two-factor interactions.There are different sets of nominal values and the goal is to find a locally D-optimal design for estimating all parameters in the model.Following [37], the Fisher information matrix of a design ξ with n observations and nominal value θ for the parameters is given by where ) T is the i th row of the design matrix.
Here p i is the variance of the i th observation at x i , i = 1, 2, • • • ,.For Poisson models with mean parameter λ i , p i = λ i and for logistic models with mean parameter λ i , The goal is to find a locally D-optimal approximate design ξ * for estimating all parameters in the mean function.This means that we have to determine the optimal number k of support points in the design, the support points x i and the proportion ω i of the n observations to be observed at x i , i = 1, . . ., k, i.e.
We note that the number of design points k cannot be less than the number of parameters (which is 16 in this case); otherwise the information matrix is singular.Our experience is that it is helpful to start algorithm and search for the optimum using a large value for k to initiate the algorithm.This is because the algorithm can only find a design with up to k support point only.A rough guide for the choice for k is to choose its value about twice the number of parameters in the model.
Table 4 displays the simulated criterion values of designs found by CSO-MA for the two logistic models and the two Poisson models, along with their standard deviations and average run times.All models have different nominal parameter values and they are shown in Table 1 in [34].Compared with results Table 2 of [34], it is evident that CSO-MA clearly and consistently outperforms the 3 other meta-heuristic algorithms: GA, PSO and CSO.The outperformance of CSO-MA relative to CSO is not surprisingly because the former is a variant of CSO.The overall comparison results show that CSO-MA is able to produce more stable and more efficient designs with shorter run time compared with other meta-heuristic algorithms.As an example, Table 5 shows the locally D-optimal design found by CSO-MA for Poisson Model 2. This PSO-generated design has 16 design points, 5 fewer than that found in [34] and higher efficiency.Having fewer points in a design can be attractive if incurring observations at different time points is costly or laborious.

Discussion
Finding optimal designs for a nonlinear mixed-effects model with possibly correlated random effects and several interacting factors is challenging and under-researched.This paper demonstrates the utility of a nature-inspired meta-heuristic algorithm in tackling complex design problems for different types of models.The models can be linear or nonlinear, have random effects or not and errors can be correlated or not.As   demonstrated in the first two applications, they can also incorporate cost structure in the design construction.The method also works for design problems with a prefixed number of possible time points and we want to find an approximate design from the design space that now consists of all possible subsets from the set {1, 2, . . ., T } and T is the predetermined end time for the study.Using CSO-MA, we found an optimal design among all designs defined on such a design space and provide innovative plots to confirm its optimality.We focus on D and c-optimality, but the metah-heuristic algorithm CSO-MA should also be able to find optimal designs under other criteria, including Bayesian and optimal exact designs.We also showed that the proposed algorithm either outperforms or performs as well as its competitors in several setups.We close by noting that nature-inspired meta-heuristic algorithms are generalpurpose optimization tools and their applications are not limited to solving design problems, or even optimization problems.For example, they can also be used creatively to solve a system of nonlinear equations [38].In summary, meta-heuristic algorithms, and particularly nature-inspired meta-heuristic algorithms, are especially useful when conventional algorithms or mathematical programming approaches are not applicable or fail to provide a quality solution to an optimization problem.

Supplementary information.
There are many monographs on nature-inspired meta-heuristic algorithms at various levels; for instance, [44] monograph is at an introductory level and [45] is at a bit more advanced level.Some monographs target specific disciplines; for example, [46] focus on agricultural applications and [47] is interested in feature selection problems and has numerous applications in finance and reinforce learning.metaheuristics has undoubtedly emerged as a set of dominant optimization tools in industry and academia [39] and in artificial intelligence research as well [1].Their applications are plentiful and have been shown capable of solving different types of complex and high-dimensional optimization problems [48][49][50].Websites like R, Python and MATLAB codes are available at youtube.com, github.com,including at professional websites, such as Matlab.comand SAS.com.For example, the website https://pyswarms.readthedocs.io/en/latest/has a set of PSO tools written in Python [51].See also https://github.com/ElvisCuiHan/CSOMA,where codes are available for using CSO-MA to solve different types of estimation and design problems in statistics, including imputing missing data optimally or in the optimal selection of variables in ecology.[52] provided a high-level Python package for selecting machine learning algorithms and their parameters using PSO.

A.1 Competitive Swarm Optimizer
Competitive Swarm Optimizer (CSO) is a relatively novel swarm-based algorithm that has been proven to be very effective in solving different types of optimization problems [53].CSO has had successful applications to solve large and hard optimization problems.For example, [54] applied CSO to select variables for high-dimensional classification models, and [55] used CSO to study a power system economic dispatch, which is typically a complex nonlinear multivariable strongly coupled optimization problem with equality and inequality constraints.
Competitive Swarm Optimizer algorithm, or CSO for short, minimizes a given function f (x) over a user-specified compact space Ω by first generating a set of candidate solutions.In our case, they take the form of a swarm of n particles at positions After the initial swarm is generated, at each iteration we randomly divide the swarm into n 2 pairs and compare their objective function values.We identify x t i as the winner and x t j as the loser if these two are competed at the iteration t and f (x t i ) < f (x t j ).The winner retains the status quo and the loser learns from the winner.The two defining equations for CSO are where R 1 , R 2 , R 3 are all random vectors whose elements are drawn from U (0, 1); operation ⊗ also represents element-wise multiplication; vector xt is simply the swarm center at iteration t; social factor φ controls the influence of the neighboring particles to the loser and a large value is helpful for enhancing swarm diversity (but possibly impacts convergence rate).This process iterates until some stopping criteria are met.
Simulation results have repeatedly shown that CSO either outperforms or is competitive with several state-of-the-art evolutionary algorithms, including several enhanced versions of PSO.This conclusion was arrived at after comparing CSO performance with state-of-the-art EAs using a variety of benchmark functions with dimensions up to 5000 and found that CSO was frequently the fastest and winner in terms of the quality of the solution [53,[56][57][58][59].

A.2 Competitive Swarm Optimizer with Mutated Agents
Our experience with CSO is that many of the generated design points are at the boundary of the design space Ω.This is helpful because many optimal designs, especially D-optimal designs, have their support points at the boundary Ω.
To this end, we propose an improvement on CSO and call the enhanced version, Competitive Swarm Optimizer with Mutated Agents or, in short, CSO-MA.After pairing up the swarm in groups of two at each iteration, we randomly choose a loser particle p as an agent, randomly pick a variable indexed as q and then randomly change the value of x pq to either xmax q or xmin q , where xmax q and xmin q represent, respectively, the upper bound and lower bound of the q-th variable.If the current optimal value is already close to the global optimum, this change will not hurt since we implement this experiment on a loser particle, which is not leading the movement for the whole swarm; otherwise, this chosen agent restarts a journey from the boundary and has a chance to escape from a local optimum.
The computational complexity of CSO is O(nD), where n is the swarm size and D is the dimension of the problem.Since our modification only adds one coordinate mutation operation to each particle, its computational complexity is the same as that of CSO.The improved performance of CSO-MA over CSO-MA to find the optimum for many complex multi-dimensional benchmark functions has been validated [60].
where a is the dispersion factor.Apart from an unimportant additive constant, the log-likelihood is l = log L = Recall that µ ∼ N (η, Λ), so the first term does not involve t j because the second moment of µ 2 is just η 2 2 + Λ 22 where η 2 is the second element of η and Λ 22 is the (2, 2) th element of Λ. Hence optimization with respect to t j only involves the second term and according to theorem 1 in [32], it is finite.It requires the computation of E(µ 2 |y) = µ 2 f (µ|y)dµ ∝ µ 2 f (µ, y)dµ Recall that for fixed i and j, we have f (y ij |µ, Σ −1 , σ −2 ) = f (y ij |θ i , t j , σ −2 )f (θ i |µ, Σ)dθ i , where θ i = (log V 0i , log c i , log δ i ) T is defined in equation ( 6) in the paper.The density f (y i |µ, Σ −1 , σ −2 ) can be approximated as follows (for a fixed design t ij , j = 1, • • • , T ): where T is the number of time points (T =16 in our paper) and K is the number of quadratures.We found that K = 5 is sufficient to obtain stable results.We denote the above approximation by F 1 (t, y i , µ, Σ −1 , σ −2 , K). Next, we use a sampling scheme to derive i.i.d.samples (µ i , Σ −1 i , σ −2 i ) for i = 1, • • • , s, and hence

Figure 1
displays the sensitivity functions of the two CSO-MA generated designs for the two examples.Since T max = 10 hours and assuming the time interval is divided into {1, 2, • • • , 10}, there are 1023 (2 10 − 1) possible sets of time points to observe a subject over the 10-hour period if the time unit is an hour

Fig. 1 :
Fig.1: The sensitivity functions of the CSO-MA generated designs for Application 1 (first row) and Application 2 (second row) versus the design space comprising subsets of possible time points when they are appropriately ordered (right) and when they are not (left).

Table 1 :
The CSO-MA generated designs for the 5 examples.
we use the Metropolis-Hasting algorithm to draw random samples from f (µ|y) and to estimate E[Var(µ 2 |y)] and E[Var(µ 3 |y)].Our algorithm for searching the Bayesian optimal design for this HIV dynamics model can be summarized as follows: are computed as follows using the first one 31.6%as an example.The smallest value of the numbers E[Var(µ 2 |y)] from the first 4 candidate designs under π 1 is 0.02882.Thus 100 * (1 − 0.0197/0.02882)= 31.6% is a conservative c-efficiency gain the Bayesian CSO-MA generated 16-point design has over the 4 candidate designs for estimating E[Var(µ 2 |y)].The other percentages in parentheses in the last column can be interpreted similarly.Noting that the c-efficiencies gain under the prior density

Table 3 :
Bayesian optimal designs found by CSO-MA for minimizing E[Var(µ 2 |y)] and E[Var(µ 3 |y)] under prior π 1 and π 2 , along with their simulated criterion values in the last column; the percentage in brackets shows the relative decrease in criterion values compared with the minimum values reported in Table2.Fig. 2: Design points from the Bayesian 16-point exact designs found by CSO-MA for estimating E[Var(µ 2 |y)] (first row) and for estimating E[Var(µ 3 |y)] (row 2).Both are determined under the prior density π 1 .The next 8 rows display the design points of the 8 candidate designs in Han and Chaloner (2004).

Table 4 :
Average simulated criterion values of the locally D-optimal designs found by different algorithms, along with their standard deviations in parentheses.The simulated results suggest that CSO-MA outperforms the other 3 algorithms in terms of criterion values, standard errors and average runtime.

Table 5 :
A CSO-MA generated 16-point design for Poisson model 2. Each row represents a design point with the last column showing the weight of the design at each point.