Model predictive control for formation reconfiguration exploiting quasi-periodic tori in the cislunar environment

Given the numerous possibilities that formation flying space missions can enable, being able to design and govern relative trajectories in this scenario is fundamental. Particularly interesting, due to the installation and operation of the Lunar Gateway, which will represent the next human outpost in the cislunar environment, will be the exploitation of formation flying missions in the vicinity of this lunar station. The nonlinear dynamics by which the Earth-Moon system is characterised offers the possibility to find bounded relative trajectories which can be used to design the formation. In order to best exploit the formation potential, some reconfiguration manoeuvres can be used, which by changing the relative geometry can increase the versatility and adaptation of the mission. In this paper, a Model Predictive Guidance and Control strategy is proposed and applied to perform rephasing manoeuvres in the harsh environment of the Near Rectilinear Halo Orbits. By including first a limited thrust constraint and then a collision avoidance, a more mission-oriented approach is provided to the system. To further increase the robustness of the on-board algorithms, an adaptive logic is provided to the different tuning weights involved in the Model Predictive Control scheme. In this way, a more flexible system is obtained, which is capable of optimally working also in the presence of a high-fidelity simulation scenario, including discrepancies with the simplified on-board dynamical model.


Introduction
In the space research community, a strong interest has always been posed to the relative dynamics of two objects flying in close proximity. Indeed, this peculiar dynamical condition applies to a large number of mission scenarios, including both multiple cooperating spacecraft and a single probe orbiting a different uncooperative object, such as a small solar system body or a space debris. In the recent years, however, this kind of operational conditions keeps becoming more and more interesting, due to the vast possibilities that are unleashed by the exploitation of smaller, thus cheaper, spacecraft working and acting together as agents of a constellation [1] or formation [2][3][4]. Employing a fleet of flying objects is advisable for many different sci-entific or Earth Observation purposes, due to the possibility of exploiting stereo-vision with the possibility to control and adapt the relative distance. To benefit from such high versatility, the on-board avionics shall be capable of handling the complexities that the harsh environment poses to the system.
The dynamics governing the relative position and velocity between two objects orbiting in the same gravitational field can be expressed in many different fashions and during the years various models have been used, with an increasing level of complexity. Starting from the very first Clohessy-Wiltshire linearised model [5] presented in the sixties for rendez-vous guidance formalisation and going to the full nonlinear model with the effects of Earth non-spherical gravity field as in [6], it becomes more and more difficult to develop algorithms to synthesize controlled trajectories. Moreover, if the two flying objects are located in a multibody gravitational regime, the complexity is further exacerbated, due to the chaotic nature of the dynamical system [7][8][9] for which small deviations in the initial conditions of a spacecraft grow exponentially, leading to completely different outcomes.
A very challenging space mission scenario that has been brought into attention during the last decade is the Lunar Orbital Platform -Gateway (LOP-G) [10], which will be a human outpost in the cislunar environment. The lunar Gateway will fly a Near Rectlilinear Halo Orbit (NRHO) which has been deemed the most suitable environment for its extremely high performances in terms of stability. Relative dynamics in this multi-body gravitational environment has been directly addressed only recently (see e.g., [11]), where the trajectory analysis to perform rendez-vous with the LOP-G has been studied deeply. Additional possibilities to be investigated concern the exploitation of the quasiperiodic motion provided by the N-dimensional relative invariant tori, which provide a region in the phase space that remains bounded. In such a way it is possible to exploit the nonlinear dynamics to keep a stable formation naturally [12,13].
In order to exploit such interesting features of this environment in a real mission scenario, given also the high sensitivity of the nonlinear dynamical system to the initial conditions, it is fundamental to employ specific on-board algorithms able to cope with the complex dynamics. The recent advancements in Guidance Navigation and Control in this environment (see [14,15]) went in the direction of increasing the space-craft autonomy. This topic is indeed fundamental to provide additional flexibility and versatility to space missions, increasing the performances and responsiveness of the agents while reducing the operational costs.
The goal of the paper is to provide a cost effective guidance and control strategy to perform reconfiguration manoeuvres, or rephasing, of a chaser spacecraft around a target, with specific application to the Lunar Gateway scenario. First, a scheme is built to target and move between natural relative trajectories around the Gateway orbit, exploiting the peculiar features of quasiperiodic invariant tori. Then, the design of the guidance and control scheme is tackled. To choose the most suitable strategy, importance is put on the three concepts of speed (computationally light algorithms), optimality (cost effective transfers), and robustness (against model errors).
The guidance law main objective is to determine the necessary trajectory and its potential correction to successfully arrive at the target position or orbit. In some specific cases, when the orbital transfer relies on ground optimization of impulsive maneuvers, a simple guidance algorithm with strong flight heritage is represented by the bi-impulse guidance or fixed time of arrival (FTOA), where the control action is determined to target and reach a specific point in a given Time-of-Flight (TOF). This approach has the advantage of being computationally light, as it does not require any closed loop logic, and has also a high flight heritage; however, it relies on linearized models and assumes impulsive manoeuvres, thus limiting the range of possible applications. Because the resulting control action is trivial, it does not necessitate any feedback control algorithm, other than the actual execution of the thrusting action for the small time window envisioned by the guidance.
In case of continuous or quasi-continuous control across the transfer arcs, a different approach must be adopted. The typical approach is the on ground computation of a reference trajectory, which is then targeted by the on-board controller. A very common type of onboard scheme is represented by the proportional integrative derivative controller (PID). This kind of strategy relies as well on linearized models, but possess also good reliability and versatility. In turn, the scheme does not rely on an optimality-driven approach, hence the overall cost tends to be higher than other more recent techniques.
Indeed, control techniques can embed the mathematical description of the dynamical model (or param-eters related to it), to approach a more optimal solution. Among them, well-known controllers are the linear quadratic regulators (LQR) [16] and the Floquet-Based Controllers [17]. The aforementioned controllers, although computationally light, rely on linear expression of the dynamics, preventing their usage in case of wide formations (where dynamics nonlinearities can be detected from spacecraft to spacecraft). A well-known scheme to deal with nonlinear dynamics is represented by sliding mode control, which demonstrated to be also a robust approach if properly designed [18]. Nevertheless, it typically provides non-optimal solutions, leading to overall higher costs. Recent studies explored the concept of zeroeffort-miss/zero-effort-velocity (ZEM/ZEV) controller which has been successfully employed for optimal spacecraft formation control. The main drawbacks are the loss of optimality in non uniform gravitational fields [19], and the lack of parameters to tune (reducing the adaptation margins to different scenarios). A good compromise is represented by the State Dependent Riccati Equation (SDRE) controller [20], where a linearization at each time step allows to leverage the LQR formulation and approach its optimality, while limiting the increase of computational cost. Previous studies showed that this approach leads to very good results in the scenario under study, especially when introducing adaptive weights of the cost function [21]; however, the study did not explore the sensitivity to errors with respect to the on-board dynamical model. Furthermore, the scheme itself does not allow a direct implementation of state and/or control constraints.
For these reasons, the paper introduces a Model Predictive Guidance and Control scheme (MPC). Although computationally heavier, the computational cost is mitigated by recasting the problem in the form of a Quadratic Programming (QP) problem. The new scheme is tested in the same fashion of the SDRE, comparing adaptive and non-adaptive weights in the cost function, and measuring the performance variation in presence of state-input constraints (maximum thrust level and collision avoidance between the spacecraft). The new controller's performance is also compared with the ones of the SDRE, assuming a perfect knowledge of the dynamical environment. Then, a high fidelity model is introduced to assess the robustness properties of both schemes, and identify the best controller for the studied scenario.
The paper is structured as follows. Section 2 briefly describes the dynamical model, the methodology to develop quasi-periodic tori in such framework, and the operative scenario selected for the study. Then, Sect. 3 presents the overall guidance and control scheme, and describes the derivation of the MPC and its constraints in quadratic and linear form respectively. The results of the analyses are presented in Sect. 4 for what concerns the influence of the constraints, and in Sect. 5 for the assessment of robustness in the high-fidelity dynamical model. Finally, critical analysis and comments about the study and its results are drawn in Sect. 6.

Formation design
This section introduces the models and methods for the design of the reconfigurable formation. The study considers a leader-follower formation type in the Earth-Moon binary system, taking the Lunar Gateway and the Orion capsule as spacecraft. To build the formation, the approximated model of the Circular Restricted Three-Body Problem (CRTBP) is adopted (Sect. 2.1), such that periodic and quasi-periodic natural motion can be easily identified. Then, it is assumed that the main spacecraft (Gateway) is located on a periodic orbit, and that the Orion capsule flies in formation around it, on a quasi-periodic natural motion described in the CRTBP as Quasi-Periodic Invariant Tori, or QPT (Sect. 2.2). The operative orbit and QPT are then obtained and described in Sect. 2.3.

CRTBP dynamics
The design of trajectories to host the formation is based on the Circular Restricted Three-Body Problem (CRTBP) [22], as it provides an autonomous dynamics system, not explicitly dependent on time. This allows one to easily define continuous families of periodic and quasiperiodic natural motion. When switching to more accurate models, such trajectories are not natural anymore, but the deviations will be small enough to require a limited station-keeping effort. The CRTBP predicts two massive attractors describing circular orbits around their center of mass. The third body (in this case, the spacecraft) is assumed to have a considerably lower mass not to interfere with the natural motion of the massive bodies. With these assumptions, the equations of motion (properly made non-dimensional according to the attractors' mass, distance and characteristic time) read: where the right-hand side terms are the spatial derivatives of a pseudo-potential function U , defined as: with μ = m 2 m 1 +m 2 being the ratio between the smaller attractor of the binary system, and the overall system's mass, and r 1 and r 2 being the third object's distance from the primary and secondary attractors, respectively.

Quasi periodic tori
Quasi periodic invariant tori are closed surfaces that describe the bounded subspace covered by specific quasiperiodic, non-resonant trajectories (i.e., a quasi periodic trajectory in the CRTBP would exactly descr ibe a QPT if propagated for an infinite time). Differently from their periodic counterpart, they are associated with two angles (θ ) and relative frequencies (ω): • Longitude (θ 0 , ω 0 ): direction of the main motion which follows the reference trajectory. • Phase (θ 1 , ω 1 ): secondary direction, describing the winding motion around the main trajectory.
The ratio of the two frequencies is an irrational number, causing the trajectory to never close itself. Also, a useful parameter to characterize the torus is the rotation number, which measures the change in the phase angle at every orbital turn (i.e., at 2π variation of the longitude). The development of QPT families follows the typical Initialization-Correction-Continuation scheme adopted for generating periodic orbital families in the CRTBP [23,24]. The family is initialized by creating a grid of states around the reference periodic trajectory, at a specified time, leveraging the center manifold of the latter. Such grid is then corrected by enforcing the following equations: which respectively ensure that the final states (propagated for one orbital period) belong to the same stroboscopic map (the locus of states with same longitude and variable phase), that such map is fixed in space (avoiding drifts along the two dimensions of the torus), and that the overall torus' period matches the one of the periodic orbit (or, equivalently, that the longitude frequency does not change). Notice that the last constraint is included to limit the search of QPT families to the ones suitable for formation flight, as having the same longitude frequency as the reference periodic orbit ensures the absence of secular drifts between the agents of the formation. Finally, the continuation is typically performed through perturbations of the previous solution along tangent direction of the family. More details about the development of QPT families can be found in [12,13,25,26].

Operative scenario
The Lunar Gateway is located on a Nearly Rectilinear Halo Orbit (NRHO), with a resonant period with the Moon (9:2), a perilune and an apolune of 3200km and 70000km approximately, and an orbital period of 6.5d. The orbit is reconstructed leveraging the initialization, correction, and continuation scheme of orbital families, starting from the bifurcation point between Lyapunov and Halo orbits [27]. Figure 1 depicts the operative NRHO in the CRTBP and in a high fidelity model, including Earth and Moon ephemerides, Sun fourthbody gravity, and Solar radiation pressure. It can be observed how the high-fidelity motion mildly deviates from the CRTBP solution (∼ 2300km at the apolune in a time range of 30 days), thus substantiating the validity of the latter model for the analyses.
The selected QPT to locate the follower spacecraft (the Orion capsule) is depicted in Fig. 2 around the NRHO, and in a relative-state form with respect to the Gateway.
The operative torus is characterized by an excursion of the relative distance between the formation's agents from 102km to 1824km. The orbital period is the same of the reference NRHO, coherently with the constraint of Eq. (6), and has a rotation number of 46.9 • per orbit.  To reconfigure the formation, transfer arcs are foreseen for the follower spacecraft (the Orion capsule), that depart from the torus surface and arrives at the same surface on a different location. The final point is defined by the target phase angle, while the longitude angle evolves linearly in time as both spacecraft move along their respective orbits. Notice that, to avoid relative drifts during the transfer, it must be ensured that the transfer "time of flight" (ToF) matches the natural longitude drift of the leader spacecraft along the reference orbit, i.e.:

On-board controller design
Previous work demonstrated the relatively low cost of impulsive reconfiguration maneuvers leveraging phase targeting on the toroidal surface, thanks also to the symmetry properties of QPT [26]. Figure 3 depicts optimal impulsive reconfiguration costs to ensure a 180°phase shift, given a ToF of 48h (compatibly with manned operations for approaching the Lunar Gateway), on the operative torus described in Sect. 2.3. Limited costs are observed across the whole torus, with peaks below 15ms −1 . Nevertheless, such optimal transfers rely on ground-based optimization and planning, and require systematic refinements to accommodate the differences between the CRTBP model (on which the maps are based) and the actual real-world dynamics of the formation. Frequent contacts between ground and the formation may result in more complex and expensive operations (especially if formations of several agents are employed), therefore it is desirable to have a (computationally) light and effective on-board control model, which allows a follower spacecraft to perform a full reconfiguration transfer with minimal data exchange with ground.
The present section describes a closed loop scheme for formation reconfiguration maneuvers, with emphasis on the actual controller design, based on a Model Predictive Control (MPC) strategy, showing its stren gths and weaknesses. Furthermore, a classical and an adaptive design is proposed, to highlight the benefits from the adaptive approach to the studied scenario.

Closed loop scheme
The implemented loop consists of a main block comprising the Guidance and the Control (G&C), an Inputs block which includes all the parameters needed by the G&C, and a Plant describing the dynamics of the spacecraft. Figure 4 depicts the three blocks, hereafter described in detail.

Inputs
The inputs block collects the full set of parameter on which the G&C loop is based. First, Orion's technical characteristics are required to provide the true control action. Some assumptions are made within this study: • Maneuvers are small enough not to cause a relevant variation in the wet mass of the spacecraft. • Engines are able to provide a fixed thrust value • A discrete control system is employed.
Such assumptions allow one to initialize fixed values of mass m and thrust F for the whole transfer time. In particular, a mass of 25855kg, corresponding to the full wet mass of the spacecraft, is considered. Also, limited thrust capabilities are planned for the analyzed scenario: it is assumed that only auxiliary thrusters are available for the maneuvers, and that only half of them (four) are active, delivering a total thrust of 4 × 490N [28]. The combination of high mass and low thrust provides an overall acceleration of 0.076ms −2 , and represents a worst-case scenario, to guarantee the effectiveness of the developed control scheme in any spacecraft condition. Finally, a sampling time of T s = 600s is here set for the update of the discrete control system. This value is derived from tests at various frequencies, to minimize the effort of the control system, while ensuring the capability of reaching the target state. Table 1 summarizes the platform parameters used for the transfer.
Then, it is required to provide the information needed to reconstruct the time-varying target state on the torus. To avoid the computational burden deriving from numerical propagation, the torus surface and its angular parameters are leveraged to define the motion of the target state over time; this corresponds to a specific quasi-periodic orbit associated with the target phase value (θ 1 f ), hence a single parameter is needed from the guidance. Notice that to identify a specific state along the target trajectory, the longitude angle would be also needed; however, the guidance works on the local stroboscopic map at each time step (as will be explained in the next paragraphs), thus imposing the same longitude as the one of the spacecraft itself.
Finally, the full state of the leader spacecraft (x re f ) is needed as well. Indeed, the parametrization of the torus is done in the context of the CRTBP, and its direct exploitation in a more realistic dynamics scenario would lead to a drift between leader and follower spacecraft of the formation. To work around the problem, the parametrization is performed to the relative torus (where the states are expressed relative to the leader spacecraft), then the true state of the leader is added. This strategy anchors the torus' states around the leader regardless of the real dynamics, although it demands an a-priori knowledge of the leader state evolution, which shall be provided to the follower spacecraft.
On-Board G&C The G&C block represents the core of the study, and it is mainly composed of three subblocks: The guidance block takes the phase of the desired final trajectory as input, and provides the time-varying point (on such trajectory) to be tracked during the transfer. To always ensure the maintenance of the formation, regardless of the controller architecture, the target point shall always belong to the same stroboscopic map of the follower spacecraft. In this way, the transfer may take variable time, and yet prevent relative drifts between spacecraft. For this reason, the target point can be uniquely identified along the target trajectory (at phase θ 1 f ) by setting the longitude as with θ 0i being the longitude of follower and leader spacecraft at the beginning of the transfer. Given the two angles, the target state can be extracted through a proper parametrization of the toroidal surface. In the present work, the torus is parametrized using 2D cubic splines, from a pre-defined grid of phase and longitude values. The resulting interpolating function, s(θ 0 , θ 1 ), maps the R 2 space of the torus' angular variables to the R 6 state space in the CRTBP rotating frame, and relatively to the leader spacecraft. The angles defining the grid do not follow the natural evolution of trajectories on the torus. For this reason, the correct reconstruction of the target phase over time can be attained by including the winding frequency of the torus, i.e.: with θ 1f (0) being the initial target phase value. Considering the real state of the leader spacecraft , the guidance block returns the final state to be targeted by the controller as: The controller block defines the law that computes the needed control action from the current error (the difference between the target state and the current state). Since the target state is not a constant quantity, the controller must solve a reference tracking problem [16,29,30]. In the present work, a Model Predictive Control scheme is employed (the details of the design are explained in Sect. 3.2). It is here important to stress that the output control action is an impulsive v with a variable magnitude, which cannot be provided by the spacecraft engines in the current form.
In this regard, the pulse-width modulation block is implemented to translate the variable input into a fixed value, which matches the engine characteristics (Table 1). At every sampled time the controller returns the required control in the form of an impulsive v. The pulse-width modulation translates the impulsive value into an equivalent finite-time maneuver by providing the correct thrust time according to the achievable acceleration of the spacecraft, namely: whereũ is the fixed acceleration of the platform. Consequently, at each interval between consecutive sampled times (from t 0 to t 0 + T s ), the spacecraft will perform a powered branch first (from t 0 + T u ), then a free drift through natural dynamics until the next time sample (from t 0 + T u to t 0 + T s ). Plant The plant block describes the "true" dynamics of the chaser spacecraft in the binary system. The dynamics propagation is split in two sub-blocks to take into account the alternate powered branches and coasting branches at each sampling time. In the present work, two dynamics models are employed: • CRTBP.
The CRTBP is used to simulate a perfect accuracy of the on-board model with respect to the real dynamics. This allows a direct evaluation and comparison of the controller performances with respect to the optimized impulsive transfers (Fig. 3), to address the feasibility of the proposed architecture. Then, high-fidelity dynamics (with true positions of the attractors, and with perturbation from Sun gravity and Solar radiation) is taken into account, to verify the robustness of the scheme against dynamics model errors.

MPC formulation
The Model Predictive Control scheme computes a control action as the result of a minimization of a cost function, which is directly discretized within a specific time horizon, and solved through a numerical optimization algorithm [31]. Hence, at each time step, a discrete prediction window is generated, for which a finite-time control sequence is searched. Generally, only the first control action from the found sequence is executed, and a new time window is generated afterwards to repeat the procedure. This allows the MPC to adapt its response to the variation of the real dynamics from the on-board model. Furthermore, the numerical optimization executed in the MPC scheme allows one to include state or control constraints, making this approach more versatile than simpler schemes such as Linear Quadratic Regulator. Indeed, the MPC scheme demonstrated to be successful for various applications in the field of autonomous GNC, for both UAVs and spacecraft [32,33]. With the clear objective of developing a light, on-board implementable control scheme, the design process described hereafter addresses three main aspects: 1. Management of the Time Horizon for the optimization. 2. Formulation of Objective and Constraint. 3. Tuning of the cost function's Weights for the optimization problem.

Time Horizon
There are two commonly employed schemes for the management of the controller's prediction window, namely the Receding Horizon and the Shrinking Horizon.
The Receding Horizon scheme is based on a fixed size window that moves at each time instant, shifting its start and end points by one sample time [34], as represented in Fig. 5.
Such scheme is continuously looped until a certain stopping criterion is satisfied, e.g., when reaching a specific relative chaser-target distance. The nature of this approach does not allow a direct control over the ToF; however, accuracy and speed of the process can be easily tuned by properly setting the length of the time window, and the number of steps within it.
The Shrinking Horizon scheme [15,35], instead, leverages a variable-dimension prediction window where the initial time shifts forward, while the final time is fixed to the instant of desired transfer completion (Fig. 6).
The main advantage is a direct control over the ToF; nevertheless, this approach may require several points within the time window, if the transfer time is long, and increasing the time step between the points may lead to large inaccuracies in the optimization process. Given the greater flexibility of the Receding Horizon approach (in terms of accuracy vs. computational cost), this scheme has been selected for the controller design of the present work.

Objective and Constraints
The nonlinear formulation of the reference tracking optimal control problem (with discrete quantities) reads: Equation (12a) is the cost function to be optimized, in the Bolza Problem form. Here, x and u are the relative state and relative control between the spacecraft (x, u) and the target point (x T , u T ) respectively, while Q and R are the corresponding weight matrices. Equation (12b) describes the on-board implemented dynamics, where f( x k ) and g( u k ) are the collection of nonlinear terms, function of the state and of the input respectively. Equations (12c) and (12d) are the set of equality and inequality constraints, respectively. Notice that, according to the problem analyzed in this paper, the target point is defined for each time step as Eq. (10), and the corresponding control action is null (u T = 0), being the target a natural trajectory in the dynamics model of the controller (CRTBP). Also, the control actions within the MPC are modeled as impulsive maneuver, therefore they can be expressed as where δ(t k ) is the Kronecker's delta. Problem (12) could be directly solved within a MPC scheme by using a Nonlinear Programming methodology; however, this typically leads to high computational burden and makes the overall process unsuitable for on-board implementation. To make the problem more tractable, modifications are introduced to recast the problem in a Quadratic Programming form.
First, the nonlinear dynamics of Eq. (12b) is recast as a sequence of local linearizations, to avoid numerical integration of the state and of the State Transition Matrix (STM). At each time step, the linearized dynamics reads: Approximating the exponential terms as I + A k (t k+1 − t k ), and recalling Eq. (13), the expression (14a) becomes Then, a generic (k+j)-th state can be expressed as function of the initial state x k and the sequence of control actions from initial node to the (k+j-1)-th node If the expression (16) is computed for all states of the prediction window (from 0 to n p ) the full stack of the states X reads Similarly, the cost function (12a) can be expressed through the stack of states and inputs as min where all the weights have been set equal for all nodes (Q k = Q n p = Q, R k = R).
After substituting the stack of states from (17) in the cost (22), a quadratic form as function of the control stack U is obtained, and reads 1 min with 1 The additional, constant term 1 2 x k A kQ A k x k would be present in the final cost expression; however, it does not affect the minimization process, hence it can be omitted.
Regarding the constraints (Eq. (12c) and (12d)), a linearization is required to fully define the Quadratic Programming Problem. In the present study, no equality constraint is included, while two inequality constraints are considered: • Maximum thrust.
The maximum thrust constraints ensures that every v k does not exceed the value that can be provided by the fixed thrust within a full sample time T s , namely: The constraint is nonlinear as it involves the 2-norm of a vector. Hence, it needs first to be linearized, according to the formulation of the Quadratic Programming Problem. In particular, a simplification is introduced by imposing that the ∞-norm of every v k in the prediction window, i.e., every component of the control stack, shall be lower than the right term of Eq. (28) divided by the square root of three, namely: Notice that this formulation makes the constraint anisotropic. In fact, any control action aligned with one of the Cartesian directions is required to be less than the actual maximum control that the spacecraft can provide; nevertheless, when the control vector is aligned with the trisectrix of any octant (defined by the three Cartesian axes), it is ensured that its 2-norm stays below the max value: The collision avoidance constraints imposes a minimum distance between the leader and the follower of the formation for the whole transfer arc. In general, the distance constraint is expressed in nonlinear form as follows: where C :=[I 3×3 0 3×3 ] , while R K O Z is the radius of the Keep Out Zone (KOZ) sphere. To embed the constraint in the Quadratic Programming formulation, a linearization is again needed. In this regard, the paper exploits the methodology proposed by Morgan et. al. [36]. The KOZ sphere is approximated into a plane tangent to it, and normal to the x k+i vector. In such a way, the constraint can be expressed in a convex form as per Eq. (32), leveraging the (known) initial value of the prediction window, x k .
Note that, to express the constraint as an upper boundary inequality, the sign of both sides of the equation have been inverted. Equation (32) is applied to each step of the time window, hence the constraint can be formulated as function of the control stack, leading to the following expressions: with where the symbol ⊗ denotes the Kronecker product. Finally, the complete problem, with quadratic cost function and linear constraints, reads: (36) The algorithm that solves Problem (36) returns the full stack of optimal control U opt ; however, only the first control is applied before a new optimization is performed.

Weight Tuning
The solution of Problem (36) is strongly dependent on the selection of weights Q and R.
The common approach is to search for Fixed Weights which ensure the completion of the transfers with the desired performance. In this paper, the main objective is the minimization of costs and the completion of the transfers within a specified ToF of 48h. The transfers are considered completed when the position error between follower spacecraft and target point falls below 2km. The weights are obtained through an optimization process as well. To set up the optimization, some simplifications are introduced. Considering that state (Q) and control (R) weights act on a cost function to be minimized, we can set R equal to identity matrix and tune Q relatively to it without any loss of generality. Furthermore, no coupling between the state terms is considered, therefore Q is a diagonal matrix. As an additional simplification, all the diagonal terms associated with the position terms of the state are assumed equal, and the same assumption is applied to velocityrelated weights as well. As a result, the weight arrays read: where q r and q v are the two scalars to be tuned and optimized. Then, the optimization problem reads where v is here the overall scalar cost of the transfer, and T is the time required to reach the target point within a specified tolerance. Despite the simplicity of the approach, this strategy suffers from a lack of robustness against deviations in the trajectory, for example due to errors in the dynamics model. Furthermore, the fixed values may not capture variations in the controller behavior during each transfer, as a consequence of large relative position displacements and changes in speed. For these reasons, the paper also implements a strategy to modify the weights on-board while performing the transfer, leveraging the adaptation law developed in [21] for a State Dependent Riccati Equation controller in the same scenario. The Adaptive Weights strategy is designed to require a single initial tuning of some terms, and provides efficient transfers of the spacecraft anywhere along the quasiperiodic torus. In particular, the position terms q r of the weight matrix Q are scaled through a coefficient which is directly related to the spacecraft-target relative state, and to the available time to complete the transfer within the ToF: with q M AX being a user-defined upper limit for the weight, and γ being the adaptation coefficient, explic-itly defined as: Here, r and ṙ respectively represent the radial distance and speed from spacecraft to target point, r T is the distance threshold which determines the end of the transfer, and t is the time passed from the beginning of the transfer. As a result, this scaling law makes the control action smaller at large distance from target and at recently begun transfers, to avoid large initial control actions, while it brings the weight closer to its max value when approaching the target and/or when the time left is short. In addition, the local evaluation of radial relative velocity adapts the coefficient to avoid excessively slow or excessively high approaching speed. In addition, a reduction step saturation β from consecutive sample points is set to avoid large control cost increments caused by local low approach speed of the target ( [21]): where β 0 denotes the β exponent from the previous step. The velocity-related weights (q v ) are instead tuned once and remain constant across the transfer. Figure 7 depicts the update scheme of the weights according to the adaptation law. Overall, the initial parameters to be set are: The present study develops the transfers and maps the costs for both strategies, highlighting the advantages of the latter technique.

Simulations and results
This section reports on the performance (in terms of maneuver cost) of the formation reconfiguration transfers leveraging the closed-loop scheme described in Sect. 3. The first part shows a performance comparison between a MPC with Fixed Weights and with Adaptive Weights. Here, only the maximum thrust constraint is considered in the MPC optimization process, to decouple the effect of CAM from the intrinsic differences of the weights management. Furthermore, the CRTBP dynamics model is leveraged as both on-board and "real" dynamics, to avoid cost variation from model errors.
Then, collision avoidance is included in the controller, to highlight the cost variation and assess the validity of the weights tuning in the presence of large trajectory detours.
As a final part of the study, high-fidelity dynamics are introduced to propagate the spacecraft state, and the robustness of the controller is tested in the presence of model errors.
All the previous analyses are performed with a parallel comparison with a SDRE control approach, to address the advantages and disadvantages in exploiting an MPC scheme for the case study.

Fixed weights vs adaptive weights
Before proceeding with the analysis, it is important to mention a particular behavior that characterizes the onboard scheme with respect to the offline optimization of the transfer of Figure 3, regardless of how the weights of the cost function are designed. Because of the localoptimizing nature of the MPC, the scheme is not able to locate the best points where to maneuver, and it will provide a control action at every sampled point for the whole transfer. This leads to unacceptably high costs when the spacecraft is flying across the perilune of the NRHO, as the high speed and low distance from the Moon make most of the maneuvers very expensive. To work around the problem a limiter in the ToF is here implemented, which always ensures the completion of the transfer before crossing the perilune, for all transfers that foresee a perilune passage within the default ToF of 48h.

Weights Setup
The first step of the analysis of the MPC scheme involves the setup of the fixed and of the adaptive weights.
The fixed weights approach required a dedicated tuning for every transfer, according to optimization problem (38). The output is a map of position-and velocityrelated weights as a function of the initial angular coordinates of the transfer along the torus, as depicted (in logarithmic scale) in Fig. 8.
Ranges of approximately 10 0 − 10 3 are observed across the torus, with peaks nearby perilune, and significantly lower value on the rest of the surface. The velocity weights follow the same trend of the position weights, with values scaled down to the 10 −2.4 −10 −0.7 range. Their lower values ensure a prompter transfer to achieve the target point within the time limit, but provide a minor control over the relative speed, preventing overshooting issues. In general, negligible to no influence of the initial phase was observed on the weights, and minor scattered variations can be attributed to the optimization process and to the tolerances associated with it. It is, however, important to stress how such minor variations do not have macroscopic influence on the costs for the transfer, as it will be shown later in this section.
The adaptive weights setup consists of a single tuning process, which is done for the whole transfer map at once. First, the maximum position weight (q M AX ) and the velocity weight (q v ) are tuned to ensure transfer times below the limit, and minimize the v over the entire torus as much as possible. This is done repeatedly for different values of β 0 , to find the initial exponent value which improves performance the most. The pre-  vious search is done with a total freedom of the coefficient β to adapt to any value (i.e., without setting the exponent limiter β). This inevitably leads to local isolated peaks due to the β overshoot problem described in Sect. 3.2. In the final step of the design, the limiter is introduced and modified until the isolated cost peaks disappear. Following the described procedure for the case under study, the parameters reported in Table 2 are obtained: Notice that the maximum position weight is far higher than the fixed weights case. This is explained by the variability of the actual weight along the trans-fers, which starts from very low values at the beginning (large distance from target), and needs to be very high at the end (to complete the transfer before the time limit when the spacecraft is very close to the target point).

Performance Comparison
The costs to reconfigure the formation using a fixed or adaptive weights approach are depicted in Fig. 9.
From a direct comparison of the costs from Fig. 9a with the optimized impulsive maneuvers of Fig. 3, the fixed weights approach shows an overall increment in costs, which however is fairly contained in the most suitable regions for performing the transfers (nearby the apolune and after passing the perilune). In particular, lower costs of ∼ 13ms −1 are observed in the apolune surroundings, centered at phase values θ 1i of 90 • and 270 • . Here, despite the relative uniformity in the weights maps, a sensitivity is observed with respect to the phase, as the cost of maneuvers begun at the apolune raise above 20ms −1 when crossing θ 1i = 0 • and θ 1i = 180 • . This is directly related to the larger initial distance from target point at such phases, which leads to a higher control action when the weights are constant values (as it is for the fixed weights strategy). Cost rapidly ramps up to above 50 ms −1 when approaching the perilune, with peaks at ∼ 450 ms −1 . Such values suggest how a constantly executed on-board control is not a viable option when close to the Moon, and should be restricted to the rest of the orbit; nevertheless, it is worth stressing that the high cost region occupy a small portion of time (few hours) with respect to the orbital period of 6.5d, and therefore does not globally hinder the applicability of this control scheme.
The adoption of the adaptive weights mostly solves the initial distance-related issues, by providing a milder control action at the beginning of the transfer. This is directly observable in Fig. 9b: not only are the cost globally lowered (far from perilune), but also a lesser dependence on the phase is here measured. More in detail, the vast majority of transfers costs less than 20 ms −1 , with very low-costs regions (< 10 ms −1 ) after the perilune passage, nearly approaching the optimal impulsive maneuver cost. On the other hand, costs at the perilune appears to be far higher than the ones displayed by the fixed weights MPC, with peaks at > 800 ms −1 . This is caused by the direct dependence of the adaptation coefficient to the relative velocity from target, which achieves high values in this orbital region. Overall, the adaptive scheme demonstrated to be more effective than the fixed weights approach. Firstly, it allows, from a first design perspective, a faster tuning process which identifies a single group of parameters for all the possible reconfigurations along the quasi periodic torus. Secondly, its capability to reduce the control effort at larger distances from target, while ensuring the success of the transfer in the given time, makes the MPC scheme less sensitive to the follower spacecraft location around the leader, and globally more convenient in terms of cost (with the only exception of the perilune region, where maneuvers should be avoided).
By comparing the MPC-based transfers with previously developed SDRE-based reconfigurations [21], it is observed a mild increment of transfer costs, justified by the finite nature of the optimization window of the MPC, against the infinite-window formulation of the SDRE. The presence of a final time of the prediction window, shorter than the real final time allowed for the transfer, makes the MPC-computed control generally prompter than the SDRE's. This difference is more evident as the initial distance of the transfers increases, as observed in one of the most expensive transfers, depicted in Fig. 10.
Regardless, the cost increase is compensated by the capability of dealing with constraints. In particular, the thrust limit helped to avoid saturation of the control action that occurred in some cases for the SDRE control.

Collision avoidance
The great advantage of the MPC is the possibility to solve constraints in an exact way. In particular, collision avoidance is of the utmost importance when dealing with relative displacements of multiple spacecraft. Given the success of the adaptation of weights shown in Sect. 4.1, it is of interest to assess the capability of the adaptation law to withstand the CAM and the large deviation that the follower spacecraft may be subjected to when a KOZ is placed around the leader spacecraft. For the problem under study, a KOZ of 100km has been selected, to enforce CAM around to the majority of the transfers, compatibly with the natural distances of the torus from the leader (i.e., avoiding excessive and unjustified deviations). Figure 11 depicts the changes in the minimum distance between leader and follower for transfer with and without CAM. Despite the success in the CAM execution, it is of interest to address whether it hinders the completion of the transfers, and how the costs are affected by it. Regarding the former aspect, the adaptive weights MPC demonstrated its capability to complete the transfers in the vast majority of locations along the torus. The only exception is given by the perilune, where the difficulty of the transfer, due to large velocity differences between spacecraft and target point, is exacerbated by the path changes imposed by the CAM. Concerning the cost variation, Fig. 12 depicts the increment caused by the introduction of the KOZ. In general, the analysis of the formation transfers with distance constraint highlighted the good capabilities of the adaptive weights controller to withstand distance constraints between spacecraft. Recalling the sampled transfer from Figs. 10, 13 adds the new result obtained from the MPC scheme with CAM.
It can be observed how, despite large changes in the trajectory directions, the arrival point almost coincides with that of the MPC without CAM. In fact, the longer time taken to fly around the KOZ makes the adaptation law provide larger weights in the second half of the transfer trajectory, thus allowing the completion of the reconfiguration in nearly the same time.

High-fidelity dynamics
The previous sections proved that the proposed MPC scheme, with the adaptation of weights, is able to fulfill the requirements of the transfers, at feasible costs, and with the additional advantage of dealing with collision Fig. 13 Transfers modification due to collision avoidance. Circles represent initial points, while diamonds the end of the depicted trajectories avoidance. Also, the results showed similar costs than those of an SDRE scheme. Nevertheless, it is of the utmost importance to address the feasibility of such a scheme (and its differences with the SDRE) in the presence of discrepancies between the on-board dynamics model and the "true" dynamics. In particular, it is assumed to design the controller schemes in the context of the CRTBP, while the spacecraft dynamics is propagated in a higher fidelity model. This model comprehends the gravity from Earth, Moon and Sun, with their ephemerides-based motion, and the Solar Radiation Pressure.
First, the SDRE controller, with the setup proposed in [21], is tested in the new environment, to provide a baseline for comparison with the MPC scheme. Then, the MPC setup proposed in previous sections is introduced, and differences with the SDRE are highlighted. The performance of both controllers are evaluated in terms of fulfillment of the position constraint at the end of the transfer (final displacement from target below 2km).

SDRE
With reference to the setup of the SDRE in [21], the SDRE scheme is tested in the new high-fidelity environment. Figure 14 displays the constraint violation for all transfers in the new dynamics, with controller parameters optimized in the context of the CRTBP.
The first observable outcome is the inability of the SDRE controller of withstanding the changes in the dynamics itself, generating a large region of failed  transfer for most of the perilune-approaching part of the orbit, and some regions after the perilune. This demonstrates the high sensitivity of the adaptive weights SDRE controller with respect to the dynamical model. It is possible to partially recover the performance of such controller by tuning the higher threshold of the position weight (q M AX ), as shown in Table. 3.
With the new setup, it is possible to partially recover the old performance of the controller, despite the persistent infeasibility across the perilune region (Fig. 15).
Nevertheless, it shall be highlighted how this generally brings the costs of the transfer to a higher value, thus making the SDRE less convenient than the MPC scheme. Furthermore, this approach defeats the purpose of an automatized control design, as a dedicated tuning is required for each epoch, given the nonautonomous nature of the real dynamics of celestial bodies.

MPC
Following the same procedure, MPC scheme (including the CAM constraint) is tested in the high-fidelity dynamics of the Earth-Moon scenario. By exploiting the same tuned parameters from the CRTBP case, the maps of Fig. 16 are obtained.
It can be noticed how such control scheme behaves in a more robust way against changes in the dynamical model, with limited failure zones, all nearby the perilune region. Of particular interest is the fact that negligible cost variation was observed, with respect to the CRTBP-based cost map, and that the controller is still able to perform the CAM due to the KOZ from the leader spacecraft

Conclusions
The present work aimed at the design of a control scheme for formation reconfiguration around the Earth-Moon Nearly Rectilinear Halo Orbit, with application to the Lunar Gateway scenario. In particular, a Model Predictive Control is implemented to enable a direct management of collision avoidance and thrust constraints, and provide a scheme capable of withstanding model uncertainties. The MPC has been recast into a Quadratic Programming problem, to partially recover the main drawback of such scheme, the computational cost, and make it more competitive for a potential onboard implementation. To further improve autonomy, a self-adaptation of the cost function weights, developed in previous works for an SDRE controller has been extended to the MPC case. The study confirmed the success of the adaptation law also in the MPC case, given the faster setup and the generally lower costs with respect to fixed-value weights. The additional presence of constraints in the MPC formulation, and in particular the collision avoidance constraint, did not pose any problem to the adaptive weights formulation. Indeed, the variability of the weights along the transfer revealed to be effective in completing the reconfiguration within the available time, despite the large trajectory deviation posed by such constraint. Furthermore, the costs from MPCcontrolled transfers was observed to be similar to the ones of the SDRE scheme (although slightly higher).
Finally, emphasis was put on the robustness of the MPC with respect to the SDRE scheme, and of the adaptation of weights, in presence of discrepancies between the on-board and the true dynamics. Here, the true difference between SDRE and MPC scheme (both with adaptive weights) was observed. In fact, the SDRE revealed to be too sensitive to the changes in the dynamics, demanding a dedicated tuning and leading to inevitably higher costs. On the contrary, the MPC proved to be more robust, succeeding at the completion of transfers in the new, high-fidelity dynamics, with its parameters still tuned in the simpler CRTBP model.
Overall, the proposed MPC design demonstrated to be a suitable scheme to be embedded in future spacecraft approaching the Lunar Gateway, provided sufficient computational capabilities of the spacecraft, and, more important, provided a sufficiently accurate orbit determination process . As future steps, the navigation aspects will be addressed to quantify the needed accuracy not to hinder the stability of the GNC loop, then computational aspects of the process will be explored to preliminarily define hardware requirements for the implementation.
Funding The authors did not receive funding from any organization for the submitted work.

Data availability
The datasets analyzed during the current study are available from the corresponding author on reasonable request.