A regression analysis of the impact of routing and packing dependencies on the expected runtime

Problems with multiple interdependent components offer a better representation of the real-world situations where globally optimal solutions are preferred over optimal solutions for the individual components. One such model is the Travelling Thief Problem (TTP); while it may offer a better benchmarking alternative to the standard models, only one form of inter-component dependency is investigated. The goal of this paper is to study the impact of different models of dependency on the fitness landscape using performance prediction models (regression analysis). To conduct the analysis, we consider a generalised model of the TTP, where the dependencies between the two components of the problem are tunable through problem features. We use regression trees to predict the instance difficulty using an efficient memetic algorithm that is agnostic to the domain knowledge to avoid any bias. We report all the decision trees resulting from the regression model, which is the core in understanding the relationship between the dependencies (represented by the features) and problem difficulty (represented by the runtime). The regression model was able to predict the expected runtime of the algorithm based on the problem features. Furthermore, the results show that the contribution of the item value drop dependency is significantly higher than the velocity change dependency.

The standard TTP formulation in Polyakovskiy et al. (2014) considers a combination of the Travelling Salesman Problem (TSP) and the Knapsack Problem (KP). The problem considers a set of items scattered in different cities where a thief should visit each city once, picking some items on the way and returning to the first city, while trying to maximise the thief's gain. The dependency in this formulation is modelled in a number of manners, such as: • Penalising the travelling time by tying the thief's velocity with the knapsack load [standard model (Polyakovskiy et al. 2014)], • Decreasing the value of items as the thief progresses in his journey, • A combination of the above [bi-objective TTP model (Bonyadi et al. 2013)].
Note that because the final aim is to cover some aspects from real-world problems, both dependency types were designed to reflect realistic situations. First, the load-velocity dependency can be adapted to reflect the relationship between the load and fuel consumption in the transportation of goods (Işıklı et al. 2020). Second, the value drop dependency can have even more realistic and important applications within the shipping sector, such as the transportation of perishable goods (Darestani and Hemmati 2019;Fatemi Ghomi and Asgarian 2019).
Since it was introduced, the TTP received the attention of researchers from the fields of evolutionary computation, metaheuristics, and operations research, mainly due to the fact that the problem is easy to understand, yet challenging to solve. Several papers proposed heuristic solution methods (Mei et al. 2014;El Yafrani and Ahiod 2016;Wagner 2016;El Yafrani and Ahiod 2018), and fewer ones tried to analyse the problem empirically and theoretically (Wu et al. 2016;Wuijts and Thierens 2019). These analyses focus on the impact of problem features, with little to no attention to the impact of the dependencies between the components. Furthermore, all of the above-mentioned works only consider the velocity change constraint, which is probably the result of the standard TTP library only supporting this dependency.
The gap we are trying to close with this work is the analysis of the impact of the above-mentioned dependency formulations and investigation of their impact on the difficulty of tackling the problem. To conduct the analysis, we consider cost models as a tool to empirically analyse the difficulty of the generalised TTP model that embeds all these dependency models. The novelty here considers imposing that the value of a picked item drops by time; besides the evaluation of the difficulty of problem instances is done using a memetic algorithm based on MA2B (El Yafrani and Ahiod 2016). The findings show that the item value drop dependency signifi-cantly impacts the difficulty of solving the TTP instances. More importantly, according to our analysis, its impact is stronger than the velocity change constraint.
In Sect. 2, we provide a background information with a brief literature review on the algorithms and analyses for the TTP and cost model-based fitness landscape analysis. Section 3 introduces the methodology adopted to analyse the problem, including the feature-based analysis and the adopted algorithm. Section 4 describes the experiments and discusses the results. Finally, Sect. 5 summarises the findings and concludes the paper.

The Travelling Thief Problem
The standard TTP can be informally stated as follows: Given are a set of cities and a set of items distributed among these cities. Each item is defined by its individual profit and weight. A thief must visit all the cities exactly once, pick some items while travelling, and return to the starting city. The knapsack has a limited capacity, which should not be exceeded. We also consider a knapsack renting rate (per time unit) which determines the amount that the thief must pay at the end of the journey.
What makes the two components of the TTP interdependent is the velocity of the thief, because it changes according to the weight of the knapsack. Specifically, the heavier the knapsack gets, the slower the thief becomes. The objective is to maximise the total gain function defined as the total profit, minus the cost of the journey.
The interdependence, such as the one present in the TTP, can reflect the characteristics and the complexity of realworld problems (Bonyadi et al. 2013). For this reason, several authors have addressed the TTP by applying different methods. Polyakovskiy et al. (2014) presented the first heuristics for solving the TTP, generating a TSP tour using the classical Chained Lin-Kernighan heuristic (Applegate et al. 2003) and with the fixed tour they applied some packing heuristics for improving the solution, such as random local search (RLS) and (1+1)-EA.
As the problem is NP-hard and the objective function is nonlinear, many researchers focused on (meta-)heuristic algorithms. The work proposed by Bonyadi et al. (2014) introduces a method named CoSolver, which is inspired by cooperative coevolution. The framework consists in splitting the problem into sub-problems and tackling them in a parallel and synchronous fashion. Mei et al. (2014) proposed a fast memetic algorithm called MATLS, which embeds multiple complexity reduction methods to solve large-scale TTP instances. Faulkner et al. (2015) explored multiple operators for optimising the packing plan combining them in a number of simple and complex heuristics. The work in El Yafrani and Ahiod (2016) proposed and compared two heuristic algorithms, a memetic algorithm (MA2B) and simulated annealing-based algorithm (CS2SA) which resulted in competitive performances for various problem sizes. Wagner (2016) investigated the use of swarm intelligence approaches with two different TSP-specific local search operators and of "boosting" TTP solutions using TTP-specific local search. Two algorithms were proposed in El Yafrani and Ahiod (2018) based on combining the 2-OPT steepest ascent hill climbing algorithm for the TSP component and the simulated annealing metaheuristic for the KP component, named CS2SA* and CS2SA-R. The obtained results showed that the proposed algorithms are competitive in many TTP instances.
In Wagner et al. (2018), the authors created a dataset with performance data of 21 TTP algorithms on the full original set of 9720 TTP instances. They also defined 55 characteristics for TTP instances that can be used to select the best algorithm on a per-instance basis, and they used these algorithms and features to construct algorithm portfolios for TTP in order to outperform the single best algorithm.
Recently, the authors in Nikfarjam et al. (2022) investigated the inter-dependency of the TSP and the KP by means of quality diversity (QD) approaches, conducting experimental studies that show the usefulness of using the QD approach applied to the TTP. Besides, the authors introduced a MAP-Elite-based evolutionary algorithm called BMBEA, using well-known TSP and KP search operators. In the MAP-Elite, solutions compete with each other to survive. Their results showed that QD approach can improve the best-known solution for a wide range of TTP instances.

Empirical algorithm analysis
Fitness landscapes represent the association between the search process and the fitness space (Watson 2010). A heuristic algorithm can be seen as a strategy for navigating the solution landscape structure in the search for an optimal solution. Thus, fitness landscape analysis is a set of tools and methods used to investigate the dynamics of heuristic search algorithms applied for specific optimisation problems (Liefooghe et al. 2013).
Cost models are fitness landscapes methods that can help predicting the performance of algorithms by identifying features that make a problem more or less difficult to solve. These models can be expressed as linear, multiple regression models , decision trees (Ochoa and Malan 2019), or other models of features and search cost; also, some models are more amendable to human interpre-tation than others. To aid interpretability, the features are extracted from the problem structure and the model can at times explain their influence in the difficulty level during the search .
Some authors have presented fitness landscape analysis for several problems, as described in Zou et al. (2022). The work developed by Merz and Freisleben (2000) studied the relation between features of fitness landscapes and recombination and/or mutation operators for the quadratic assignment problems. The authors in Richter (2008Richter ( , 2013 analysed the fitness landscape of a dynamic optimisation problem, investigating the influence on the performance of the algorithm. The work proposed by Janković and Doerr (2019) designed a prediction model for the algorithmic performance of CMA-ES variants by using a random forest regression model based on exploratory features of fitness landscapes. Also, the authors in Lu et al. (2019) presented a spatial-domain fitness landscape analysis framework to visualise the fitness landscapes regarding a specific combinatorial optimisation problem and evaluate its properties. By extracting characteristics of combinatorial optimisation problems allowed study the behaviour of algorithms effectively.
Recently, the paper in Li et al. (2021) focused on the adaptability landscape features of optimisation problems by applying differential evolution algorithm. The authors presented a quantitative analysis of the fitness distance correlation information, evaluating the difficulty of solving the problem.
Some papers explored TTP using fitness landscape analysis. In Wuijts and Thierens (2019), the authors considered local search operators and investigated the fitness landscape characteristics of some smaller instances of the TTP. The local search operators included 2-opt, Insertion, Bitflip, and Exchange and metaheuristics included multi-start local search, iterated local search, and genetic local search.
Another recent work in El Yafrani et al. (2022) investigated 3 dependency models of the TTP, in addition to a dependency-free model for comparison purposes. The authors used local optima networks as a fitness landscape analysis tool to study the difficulty of the search landscape for these dependency models. The results show that the dependency-free landscape is the most difficult to navigate based on the basin sizes and their correlation to the fitnesses. Such a result is difficult to interpret as it is expected that the dependencies create more difficult instances (Bonyadi et al. 2019). The authors speculated that this result is due to the join neighbourhood search algorithm, an algorithm that joins two neighbourhood sets into one, which creates an additional complexity even for the dependency-free model.

Generalised TTP model
Herein, we provide the mathematical formulation of the generalised TTP, based on the standard TTP as introduced in Sect. 2, and the TTP2 formulation in Bonyadi et al. (2013). The model embeds two types of dependency and is conceived such that the strength of these dependencies is tunable through problem features. Input data We define the following problem input parameters: • N = {1, . . . , n} is the set of labels, representing the cities to be visited, • {d i j } is the matrix of distances between these cities, • M = {1, . . . , m} is a set of labels corresponding to the items scattered among cities, • p k and w k represent the profit and weight of an item k ∈ M, respectively, • W is the knapsack capacity, • R is the knapsack renting rate, which is used to determine the cost of the journey, • v max and v min represent the maximum and minimum velocities, respectively. • A = {A 1 , . . . , A m } denotes the availability vector, such that A k ∈ {1, . . . , n} contains the reference to the city that contains the item k.
Decision variables A TTP solution is represented using two components: The first is the tour X = (x 1 , . . . , x n ), a vector containing the ordered list of cities, which is encoded as a permutation. The second is the picking plan Z = (z 1 , . . . , z m ), a binary vector representing the states of items, where 1 is associated with the packed items, and 0 to the unpacked ones.
Interdependencies What makes the two components of the TTP interdependent is the velocity of the thief which changes according to the weight of the knapsack. Therefore, the velocity at city where w x i is the weight of the knapsack at city x i . In addition to the velocity change dependency in Eq. 1, we add a second dependency on the value of items. This is achieved by imposing that the value of a picked item k drops by time from its initial value p k to p final k following Eq. 2.
where D r is the item value dropping rate, p k is the initial value of item k, T k is the carrying time of item k, and Q is a constant calculated as shown in Eq. 3.
Eq. 3 seeks to satisfy P max × D T k Q r = 1 2 × P min , where P max and P min represent the maximum and minimum item values, respectively, while T min is the minimum travelling time.
Furthermore, the dependencies in the generalised TTP model are tunable through the parameters v min ∈ [0, v max ] and D r ∈ [0, 1], where: • The velocity drop dependency is controlled through v min .
When v min = v max , this dependency is cancelled as the thief will be always travelling at the maximum velocity v max independently from the knapsack load. • The item value drop dependency is controlled through D r . Similarly, this dependency is deactivated by setting D r = 1.
Based on the above, the proposed model covers all dependency models in El Yafrani et al. (2022), including the standard TTP.
There are two special cases to consider: (1) Setting v min = 0 can lead to a velocity of 0. Based on Eq. 1, , which leads to the thief remaining stationary if the knapsack load reaches the maximum capacity W . (2) Setting D r = 0 makes all the picked items valueless in the end as p final k = 0. Based on these remarks, the considered values for v min and D r are always chosen strictly positive in our empirical study.
Note that it is difficult to predict how a change in dependency feature values, e.g. lower values of D r and v min , will impact the difficulty of the problem instances. This partially depends on the algorithm used to tackle the problem.
To illustrate how these interdependencies work, we consider a TTP instance with 6 cities and 20 items, and a potential solution as shown in Fig. 1. Note that this is a simplified illustration as the distances, item values and weights among other problem parameters are not shown for simplicity. Following the velocity change dependency, assuming that v min < v max , the velocity will start decreasing from city B where items 1 and 4 are picked, passing through city C does not influence the velocity as no items are picked, and so on. Now considering the item value drop dependency, the items 1 and 4 lose most of their value as they are carried for almost the entire journey, while item 17 loses the least of its initial value.

Objective function
To focus on the dependency analysis, we consider a linear combination of the total profit and cost of the journey as shown in Eq. 4. where is the travel time from x i to x i+1 . As mentioned earlier, the introduction of the velocity change and item value drop dependencies leads to a nonlinear objective function. Note that the nonlinearity exists in the first term of the equation as well, since p final k has a nonlinear formulation (Eq. 2) and depends on z. This increases the difficulty of the problem as it cannot be solved using standard MILP methods, and rather requires an ad hoc solution.

Memetic algorithm
As a baseline for our analysis, we consider the memetic algorithm presented in Algorithm 1. 1 The algorithm uses genetic 1 The implementation of the memetic algorithm is done in Java based on the codes available at https://github.com/yafrani/ttplab. operators (tournament selection and crossover) combined with a hill climbing local search algorithm to evolve a population of candidate solutions iteratively. The implementation is based on the memetic algorithm MA2B initially designed for the standard TTP (El Yafrani and Ahiod 2016), but differs from the original implementation by removing all domain knowledge from the logic of the algorithm.
Algorithm 1 Memetic Algorithm for the Generalised TTP Select parent solutions 19: The motivation behind choosing a memetic algorithm is due to the fact that it borrows aspects from classical local search heuristics and evolutionary algorithms (exploratory operators) (Wang and Wang 2022). Most of the other algorithms are either based on (stochastic or deterministic) local search or include domain knowledge from TTP standard model, which is practically unusable for the generalised formulation we proposed (due to the second dependency-item value drop). These aspects combined result in an algorithm that can both explore and exploit the solution space. Furthermore, the other TTP heuristics use problem knowledge as the main component of their logic (Mei et al. 2014;Faulkner et al. 2015), making them unsuitable for this analysis.
Notations used in Algorithm 1 are presented in Table 1. Note that, to simplify the pseudocode notations, we use the same notation to evaluate a solution and to access its objective value. Therefore, only the first call of evaluation function S rand A randomly generated solution to avoid premature convergence g * The optimal objective value G(.) is considered to increase the counter of the number of evaluations T , which will be used to calculate the expected runtime (ert). Furthermore, the algorithm does not take into account the dependencies as part of the domain knowledge used for the optimisation. This is done on purpose to avoid giving the algorithm an unfair advantage for some specific instance categories, and to obtain insights which can be used to improve the efficiency the algorithm. Each solution in the population is initialised with a random permutation for the tour and random binary vector for the picking plan (line 4. As we are interested in the features making the problem difficult, random initialisation is important to for the same reason stated above, i.e. ensuring a fair algorithm and not favouring any specific instance categories. A best improvement hill climbing procedure is then used to refine the solutions by finding a local optima. The hill climbing algorithm uses two neighbourhood searches for the tour and the picking plan sequentially. A simplified pseudocode of the hill climbing procedure is shown in Algorithm 2. The N 2-OPT (S) neighbourhood function returns a set of solutions where a new tour is obtained by applying the 2-OPT operator (Croes 1958) on the tour of S, while the picking plan is copied from S as is. The same logic is followed for the N bit-flip (S) function (Eiben and Smith 2015) where only the picking plan is updated. More details about these operators can be found in Eiben and Smith (2015).
The population is then sorted, and the best solution is identified (lines 10-15). The tournament selection is applied to select 2 parent solutions, with a tournament size of N tournament to produce N offspring (line 18). This is followed by the maximal preservative crossover (  Once the offspring is generated, it is combined with the population and the best solutions are kept for further improvement (lines 33-34). The algorithm stops when the maximum number of evaluations T max is reached, or a near-optimal solution (solution with a gap to optimal smaller than ) is found (line 35).
The parameters of Algorithm 1 are summarised in Table 2. The fine tuning of the parameters is done taking into account the size of the instances used for the analysis, with values chosen empirically based on a random sample of instances (diversified based on the features in Sect. 4, and on previous studies (El Yafrani and Ahiod 2016;El Yafrani et al. 2022). This is ensure that the algorithm finds a near-optimal solution in most cases, which is important to conduct the analysis.

Estimation of the expected runtime (ert)
One way to measure the performance of an algorithm A (search cost) is consider the expected number of function evaluations necessary to achieve an -approximation. To achieve this, we save the number of function evaluations until an -approximation is found which characterises a "success". Otherwise, the search cost is set to a predetermined maximum T max . This approach is similar to the one presented by Hansen et al. (2010) and Liefooghe et al. (2015).
In this approach, we denote p s ∈ [0, 1] as the probability of success of algorithm A, and T f as the random variable measuring the number of function evaluations for unsuccessful runs (failures). After (t − 1) failures, each one requiring T f evaluations, and the final successful run of T s evaluations, the total runtime is T = t−1 i=1 T f + T s , where t is the random variable representing the number of runs. t follows a geometric distribution with parameter p s . Equation 8 takes the expectation by considering independent runs for each instance, stopping at the first success: Here, the estimated success rate (p s ) is computed by the ratio of successful runs over the total number of executions. The expectation of a geometric distribution for t with parameter p s is equal to 1/ p s . The expected runtime for unsuccessful runs E[T f ] is set as a constant limit (T max ) on the number of function evaluation calls, and the expected runtime for successful runs E[T s ] is estimated as the average number of function evaluations performed by successful runs. Given these assumptions, ert can be expressed as an estimation of the expected runtime E[T ] as presented in Eq. 9.
where t s is the number of successful runs, and T i is the number of evaluations for successful run i.

Experimental setting
The experimental framework consists of generating enumerable instances based on the problem features of interest, then running the algorithm and modelling the expected runtime (ert). 2 The instance generator considers the following features: • Number of cities (n): Represents the number of cities to visit. This feature is not considered as an input to the regression model. Instead, the experiments are replicated for multiples values of n ∈ {5, 6, 7, 8}. Small values of n are chosen in order to be able to enumerate the solutions, which is used to derive the optimal values, as there is no exact methods able to solve the problem. Note that we will exclude some illustrations for n = 5, 6, 7 for the sake of simplicity, but result summaries will be given for all the values of n.
• Dropping rate (D r ): Represents the dropping rate at which the value of an item decreases through time as shown in Eq. 2. D r takes values from the set {0.7, 0.75, 0.8, . . . , 1}.
As this feature can be considered an ordinal variable, numerical values (between parentheses) are assigned to represent the correlation strength, which will be useful for the regression analysis. C is a factor occurring in the maximum weight of the knapsack which is given in Eq. 10.
Note that values lower than 3 have been excluded as they can result in capacities smaller than the smallest item weight.
The features D r and v min are considered as they control the strength of the dependencies, while the motivation for choosing T and C is to compare their impact on the expected runtime with that of the dependency features. Furthermore, the choice of T and C, instead of other features, is based on previous analyses .
10 instances are generated for each combination of feature values, resulting in a total of |n| × |D r | × |v min | × |T | × |C| × 10 = 67, 200 instances. Then, 30 independent runs of Algorithm 1 are performed for each generated instance.
Afterwards, the ert is calculated for each instance based on Eq. 9.
It is worth noting that the experiments result in 23 instances where the algorithm fails to identify an optimal or near-optimal solution, i.e.p s = 0, which results in a division by zero. To solve the issue, we set ert = ∞ and consider these results as outliers for the analysis. Note that, based on a deeper look into the individual results, we concluded that this happens for different categories (combinations of features), i.e. all the categories are covered in the analysis.

Preliminary analysis of the runtime
The histograms in Fig. 2 show the distributions of ln(ert), where ln(.) denotes the natural logarithm function for the different values of n. The natural logarithm is only used to better visualise the ert data. Indeed, ert follows a distribution that is heavy-tailed (with a very high kurtosis kurt[ert] = 1390.38 for n = 8) and asymmetric (with a skewness μ 3 = 32.51 for n = 8).
The nature of ert makes it virtually impossible to use standard linear regression models. The most significant factor is the existence of outliers (large ert values)-representing hard instances-which are difficult to properly include in the regression model. Note that the existence of heavy tails in the dependent variable is not a problem in itself, except when it leads to heavy-tailed residuals, which is the case here (based on experiments with linear regression and regression tree models).
It is usually favourable for the observed residuals to be approximately normally distributed. In order to obtain (approximately) normal residuals, a transformation, such as the logarithm, square root or cubic root, should be applied to the ert. While the mentioned transformations can be efficient in taming outliers in many cases, the resulting residuals, even when using regression trees and other regression models, remain heavy-tailed because the high variation is the ert values. A better alternative in this case is to use the reciprocal, 1 ert , which results in a non-normal, but smaller-tailed distribution of the residuals for a regression tree as shown in Fig. 3. Furthermore, in order to preserve the ert orders and improve the illustrations, we apply the transformation 1 − min(ert) ert instead, which does not impact the regression analysis results. We will refer to the resulting values as the runtime scores.
It is worth noting that 1 ert should not be interpreted as the expected rate. Indeed, as the reciprocal is a convex function, we have: = 1 ert based on Jensen's inequality (Jensen 1906). Furthermore, as the distribution of ert is unknown and difficult to fit, it is hard to deduce E[rate] using the law of the unconscious statistician. Therefore, 1 ert can be simply interpreted as the ert reciprocal.
As linear models were not efficient in capturing the variability of the data, we utilise regression trees for two main reasons: (1) they are efficient in handling larger data with outliers, (2) they can produce explainable outcomes, which is important to understand the features' impact on the expected runtime and identify what makes some TTP instances harder to solve. In other words, regression trees offer a better tradeoff between efficiency and explainability compared to other alternatives.

Regression analysis
In order to analyse the impact of the problem features on the difficulty of tackling the instances, one must map the problem features to the expected runtime to identify near-optimal solutions. This can be achieved using non-linear regression analysis. Specifically, we consider regression trees using the mean squared error (MSE) to measure the quality of split, and two parameters to control the model's complexity. The first is the maximum depth, which represents the maximum number of node splits the regression tree makes before returning a prediction. The second is the minimum sample size, which represents the minimum number of samples at each leaf. We denote by RT n d max ,s min the regression tree 3 obtained for problem size n, by using the parameter values d max as the maximum depth and s min as the minimum sample size, during the training phase.
Model tuning is achieved using a 5-fold cross-validation, and based on a grid search with different values of d max and s min . Figure 4 shows a heatmap of the trade-off between the regression tree complexity and the model fitness for n = 8. A maximum R 2 of 0.53 can be reached using the regression model RT 8 9,40 , i.e. the corresponding models can explain 53% of the variation in expected runtime that is predictable from the problem features. The remaining 47% of the variation can be attributed to many other aspects, the most likely is the stochasticity of the considered memetic algorithm, which leads to high variation in the ert within some instance categories (combination of features), and high variation in the number of evaluations for the same instances. Another, less likely scenario is that there are combinations of features that the regression tree was not able to identify. This is believed to have minimum impact as other machine learning algorithms (random forest and neural networks) were investigated and could not result in a better coefficient of determination.  Naturally, more complex trees can result in higher R 2 values. Nevertheless, this comes with the cost of losing the ability to interpret the model. For the purpose of this analysis, the goal is to understand the impact of the feature combinations on the expected runtime. Therefore, we favour small explainable trees, even if they explain a lower percentage of variability between the features and the expected runtime.
Based on the above, we consider two regression models for each n. RT 5 6,280 , RT 6 6,20 , RT 7 7,20 , and RT 8 9,40 are the models with the highest accuracy, while RT 5 3,380 , RT 6 3,380 , RT 7 3,380 , and RT 8 3,380 represent a good trade-off between the tree complexity and quality of fit for all n values. The evaluation metrics of the two resulting models are shown in Table 3 for each value of n, where the first model corresponds to the explainable one, and the second corresponds to the one with the highest coefficient of determination.
In general, Table 3 shows that the loss in quality between the simple and complex models, in terms of error metrics and R 2 , is minimal and can be neglected due to the gain in explainability. Nevertheless, a difference in the performances can be noticed between the different models for each n value. Looking back at the different ert distributions, this can be explained by the fact that as n grows, the variance in ert decreases.
On the one hand, the complex regression trees (RT 5 6,280 , RT 6 6,20 , RT 7 7,20 and RT 8 9,40 ) can provide useful global insights on the importance of features and to what extent we can explain the variability in the runtime variable, but it could be less practical to extract which combination of feature values lead to a specific expected runtime. On the other hand, the explainable regression trees (RT 5 3,380 , RT 6 3,380 , RT 7 3,380 and RT 8 3,380 ) result in a competitive quality while having the additional benefit of being explainable, allowing us to draw useful conclusions on what makes instances easier or harder to solve. Table 4 provides a macroscopic idea on the impact of the problem features using the Gini-Simpson Index (Jost 2006). The feature importance values vary between the two models, but are aligned in sense that they rank features similarly. For this holistic investigation, we focus on the models having the better quality of fit.
The results show that the capacity class (C) is the most influential feature contributing to the difficulty of instances. This is followed by the dropping rate (D r ) and profit-weight correlation (T ), while the minimum velocity (V min ) has a much lower impact index. Comparing the two dependency features shows that items losing value with time is highly important to predict the efficiency of the algorithm in tackling the problem. However, the change of velocity of the thief based on the knapsack load seems to have minimum impact on predicting the performance of the algorithm when tackling the problem. Figure 5 shows the evolution of the feature importance indices in terms of n. The dependency features tend to have a higher importance index as the problem size increases, and MAE is the mean absolute error, MSE is the mean squared error, RMSE is the root-mean-squared error, and R 2 is the coefficient of determination this phenomenon is stronger for D r compared to v min . The same can be said about the feature T . On the other hand, the importance index for C is decreasing significantly. While the above analysis can help in identifying important features (and potentially eliminating less important ones), a more in-depth analysis should be done at a microscopic scale, which can be achieved using RT n 3,380 . Figures 6, 7, 8, and 9 show the resulting regression tree models {RT n 3,380 }, n ∈ {5, . . . , 8}, which will help us compare the regression logic across different instance sizes. Additionally, in Appendix 1 we reproduce the regression models for n = 6, 7 to show that the results can be generalised, i.e. they are not dependent on the generated instances, but on the problem features.
Based on this, we observe that the regression models follow a similar logic which can be clearly seen in the branching nodes. Small deviations exist in the branching nodes and pre-dicted values. Looking at all the four models, we can report the following general observations and explanations: • Setting D r = 1 results in the standard TTP model as the dropping rate dependency is deactivated, making the minimum velocity feature the main separator as it represents the velocity drop dependency. Furthermore, the lower minimum velocities (v min ) result in slightly harder instances. • D r has a clear impact on the difficulty of instances. Lower dropping rate values result in more difficult instances. • Larger knapsack capacities (C) result in harder instances, as clearly seen in branches B 2 , B 4 , B 5 , and B 6 . This is in contradiction with the findings in El Yafrani et al. (2018,2022) where the difficulty of instances is associated with the size of basins of attraction, in the context of a local search algorithm applied to the standard TTP. Further experiments considering only standard TTP instances (D r = 1) show a similar behaviour, confirming that larger knapsack capacities increase the difficulty of the instances. This is suggesting that the contradiction is due to the nature of the search algorithm, not to the complexity added by considering the dropping rate dependency. Indeed, the local optima network analysis is only suitable for embedded neighbourhood search algorithms, and the conclusions cannot be extrapolated to more sophisticated algorithms such as the considered evolutionary algorithms. • For smaller capacities, the profit-value correlation (T ) has a significant impact on the problem difficulty. The hardest instances are the ones where the profit and weight are bounded and strongly correlated (T = 2) and the It is interesting to see how the item value drop dependency results in significant gaps in the performance of the algorithm. When it is deactivated (standard TTP), the instances can be solved relatively fast, and the velocity change dependency has only a small impact on the runtimes. When it is activated, the impact of the velocity change dependency is dominated by the capacity and correlation features.
A possible explanation of the stronger impact of D r compared to v min can be attributed to the formulation of dependencies. Specifically, in the dropping rate dependency (see Eq. 2), the updated item values are dependent on both the previous item value and the time the item has been carried. On the other hand, in the velocity change dependency (see Eq. 1), the thief's updated velocity only depends on the knapsack load, and its previous velocity is completely omitted. These findings show that the impact of different dependency relationships can differ significantly when formulating a problem with multiple components. In our case, the analysis shows that adding the item value drop dependency makes the TTP a much more challenging problem compared to the standard model, leading to harder instances, which is reflected by the expected runtime. Therefore, this aspect should be given more attention when investigated the TTP in particular, or other capacitated routing problems in general, especially because the impact of these features evolves based on the problem size.
Recognising that different dependencies can have a significantly different impact on the problem's difficulty is important. However, one can go beyond and seeks ways to use these results to improve the way these problems are tackled.
As it is possible to estimate how much time would be needed to find a good approximation given a specific problem instance, based only on known features. One way to directly use the results reported here is to make an informed and automated decision on the maximum number of iterations (or time budget) needed to achieve near-optimality, given a specific problem instance. It is also possible to revisit the model and simplify it to reduce the impact of particular dependencies. This should be done carefully as it can lead to undesirable outcomes due to model oversimplification. Another possibility is to use these results as domain knowledge included in the solution methods. Such an approach can be clustering instances into categories that can be tackled with different approaches based on their difficulty, or tuning the heuristic parameters based on these instances clusters.

Conclusion
In this paper, we investigated the difficulty of the Travelling Thief Problem by considering two types of dependency on the fitness landscape using performance prediction models (regression analysis). We introduce a tunable model allowing to control the velocity change and item value drop dependencies using associated problem features. The impact of these features on the expected runtime is then evaluated to understand how these features impact the search landscape for a memetic algorithm.
This study allowed us to better understand what makes instances harder and gave us new insights on the impact of the problem features. The analysis shows that the inclusion of the item value drop dependency leads to harder instances, which is reflected by the expected runtime. The results also showed how the impact of these features evolves based on the problem size. In particular, the dropping rate tends to be more important as the problem grows and its impact is stronger than the velocity change constraint.
A continuation of this study is to map these features to the choice of algorithm parameter. Besides, as future work, we intend to explore other fitness landscape techniques such as fitness cloud, auto-correlation, time to local optimum, distance to global optimum, and information analysis: entropy, information stability, partial information content and density basin information.
Funding No funding was received to assist with the preparation of this manuscript.
Data availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest None of the authors of this paper have a financial or personal relationship with other people or organisations that could inappropriately influence or bias the content of the paper. The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval This paper does not contain any studies with human participants or animals performed by any of the authors. This manuscript is the authors' original work and has not been published nor has it been submitted simultaneously elsewhere. All authors have checked the manuscript and have agreed to the submission.

Additional results
In this appendix, we show the results for additional experiments on different sets of instances for n = 6 and n = 7. The same process defined in the earlier sections was used to create them, but just different seeds of the random number general were used. The goal is to show that different sets of instances generate roughly the same regression model, i.e. regression trees with a similar logic as shown in Fig. 10.
When comparing these two with the corresponding trees in Figs. 7 and 8, we can see that the conditions at the inner nodes are almost always identical (i.e. for 10 of 13 inner nodes) or very similar, and the respective errors (shown in Table 5) and sample numbers are very close matches, too. Hence, we conclude that even though the methodology is based on randomly created instances and even though it employs a memetic algorithm as a randomised search heuris-