Improved metamodels for predicting high-dimensional outputs by accounting for the dependence structure of the latent variables: application to marine flooding

Metamodelling techniques (also referred to as surrogate modelling) have shown high performance to overcome the computational burden of numerical hydrodynamic models for fast prediction of key indicators of marine flooding (e.g. total flooded area). To predict flood maps (e.g. spatial distribution of maximum value of water depth during a flood event), a commonly-used approach is to rely on principal component analysis to reduce the high dimensionality of the flood map (related to the number of pixels typically of several 1000 s) by transforming the spatial output into a low number of latent variables (typically < 10). One commonly-used approach is to build one metamodel per latent variable by assuming independence between the latent variables. Using two real cases of marine flooding, we show that the predictive performance of the metamodelling approach (relying on kriging metamodels) can significantly be improved when the dependence structure of the latent variables is accounted for. Our tests show that the most efficient approach relies on the clustering in the space of the latent variables (here with k-means algorithm). Complementing the approach with a kriging metamodel specifically dedicated to handle vector-valued variables allows an additional increase of predictability, but only if two conditions are jointly met: (1) the number of training samples is sufficiently high to learn the dependence structure over the respective clusters; (2) a careful selection of the number of clusters has been performed.


Introduction
Over the last decade, advances in numerical simulation of hydrodynamic processes have tremendously improved forecast, hazard and risk assessment as well as design of early warning systems (denoted EWS) for coastal flooding hazards like storm-related marine submersion (e.g. Chaumillon et al. 2017). However, the numerical simulation of those processes at a resolution useful to flood hazard/risk assessment is still hampered by high computation time costs (i.e. larger than the simulated time) despite the increasing computer power over the last years. This is especially challenging for estimating the marine flooding due to wave processes, because accounting for this process implies resolving the short-wave phase, which requires very fine space (few meters) and time (i.e. high frequency of several Hz) discretisation (e.g., Bertin 2016) and results in very large computational times. This computational burden makes their integration within EWS for marine flooding very difficult (as discussed for instance by Denamiel et al. (2021) for the design of extreme sea-level EWS), and might hamper their application in analysis implying a large number of numerical model calls (typically of the order of 1000-10,000) like ensemble forecast, probabilistic flooding hazard assessment (PFHA) or uncertainty quantification (UQ). For instance, Laborie et al. (2020) provides a good illustration of the computational effort that is necessary. To assess the influence of the forcing uncertainties in the hydrodynamics of the Gironde estuary, the use of 250,000 model runs for a simulation time of 101 days on 32,768 cores using GENCI HPC computational resources (https://www.genci.fr/en) was necessary.
A possible approach to overcome this problem is to rely on the statistical analysis of existing databases of results that were pre-calculated using full process high-fidelity Extended author information available on the last page of the article numerical simulations. This approach named ''metamodelling'' (also referred to as a ''proxy'' or ''surrogate'' or ''response surface'') has shown promising results in multiple contexts, either for EWS (e.g., Idier et al. 2021), for ensemble forecasts (Lecacheux et al. 2021), for PFHA (e.g. Rueda et al. 2016) or for UQ (e.g., Li et al. 2020;Perrin et al. 2021), by providing predictions with a reasonable prediction time cost (\ minutes).
One major difficulty in the implementation of metamodels for coastal flooding is related to the high dimension of the flooding numerical model's output that generally varies over space; a typical example being a flood map representing the maximum value (over time) of water depth (i.e. difference between the water level and the topographic elevation) as a function of the spatial location. In practice, the locations are discretised, and the output is represented by a high-dimensional vector of length corresponding to the number of pixels. Depending on the processes involved in the flooding (overflow, wave-induced overtopping, coastal defences' breaching), the discretisation can be fine (down to a few meters), hence resulting in maps with typically several thousands of pixels. This large dimension makes the training of a metamodel at each pixel separately impractical.
An alternative approach is to combine the metamodelling approach with a dimension reduction technique whose objective is to extract a much smaller number of latent outputs to represent the very high-dimensional output. A popular approach is based on principal component analysis, denoted PCA (Jolliffe 2002). PCA is a linear projection, which allows to transform the response from latent space back to original space very easily. The combined PCA-metamodel procedure has been successfully implemented for marine flooding by Jia et al. (2013) for hurricane-induced waves, by Ma et al. (2022) for hurricane-induced surges, by Li et al. (2020) in an estuary context, by Roy et al.(2018) for river flow, but also in various other application areas such as chemical process modelling (Chen et al. 2011), reservoir engineering (Thenon et al. 2016), urban drainage (Nagel et al. 2020), atmospheric modelling (Ryan et al. 2018), computer code calibration (Higdon et al. 2008), climate modelling (Chang et al. 2014), etc. Several improvements have been proposed in the literature either by relying on: (1) functional basis using e.g. B-spline basis or wavelets (Marrel et al. 2011;Perrin et al. 2021), or using optimally selected bases; e.g. for calibration (Salter et al. 2019); (2) non-linear projection techniques like autoencoders with additional clustering approaches to identify complex input-output relationships (El Garroussi et al. 2022), kernel version of PCA (Lataniotis et al. 2020) or Grassmannian diffusion maps (Kontolati et al. 2022a). See a comprehensive survey by Kontolati et al. (2022b); (3) data imputation techniques (Kyprioti et al. 2022) to handle dry nodes in the training database.
Our objective is to address another line of potential improvement of the combined PCA-metamodelling approach, namely integrating the dependence structure of the latent outputs. By construction, PCA finds a linear projection that transforms the set of original correlated variables onto a projected space of new uncorrelated variables (named latent outputs and denoted z in the following) by maximising the variation of the projected space. The absence of correlation does not necessarily mean absence of dependence. This is exemplified in Fig. 1 with the PCA results for real case 1 described in ''Real case applications'' Section. Figure 1 shows the relation between three latent variables (z 1 , z 2 , z 3 ) derived from PCA (with prior mean centering), which allows to explain almost 99% of the variance. A near zero Pearson's correlation is here outlined (upper right part of Fig. 1) whereas a clear nonlinear relation is shown (lower left part of Fig. 1). However, a widely-adopted assumption is to neglect this dependence structure by learning in turn a metamodel per latent output, i.e. by considering the latent variables independently and separately.
In this study we aim at investigating two questions: (Q1) does accounting for the latent outputs' dependence improve the prediction performance?; (Q2) how to integrate this dependence in the metamodelling procedure? In the following, we address these questions by focusing on one class of metamodelling, namely kriging (Williams and Rasmussen 2006), because of its wide use in many coastal engineering applications (see afore-mentioned references) in particular when the number of training samples is smallto-moderate (typically of 100-500) and because of the possibility to treat multi-output regression problems (Alvarez et al. 2012).
The paper is organized as follows. In ''Real case applications'' Section, we first present two real cases that motivated this study. In ''Methods'' Section, the statistical methods and the different kriging-based metamodelling approaches to account for the dependence are described. In ''Application'' Section, we apply PCA and analyse the latent outputs' dependence. On this basis, the different metamodelling approaches are applied and their predictive performance are assessed. In ''Discussion'' Section, we further discuss the robustness of the results to our main assumptions.

Real case applications
In this section, we describe in more details two real case applications, namely a case of storm-induced marine flooding on the French Atlantic coast ( ''Real case 1'' section), and a case of marine flooding induced by cyclonic waves in Reunion Island (''Real case 2'' section). Both study sites were selected, because they are representative of two distinct situations, i.e. overflow-induced flooding on a relatively large area of several km 2 that presents different land covers (like urban areas, various types of vegetation, marshes, sand dunes, etc.) for real case 1, and wave overtopping-induced flooding on a relatively small urban area for real case 2.

Real case 1
The first case study focuses on the Boucholeurs district and the natural reserve of Yves marshland located on the Atlantic coast of France (see Fig. 2) near La Rochelle city, in a low-lying natural coastal marsh and dune system, in the vicinity of Oléron Island (Fig. 2). The main hydrodynamic drivers are the tide and the storm surge (which results from the contribution of the regional wave set-up and the atmospheric storm surge), while the direct effects of waves can be neglected at this site. Overflow is thus the main flood process at this site and is simulated with the numerical code named MARS (''Model for Application at Regional Scales'', see Lazure and Dumas 2008) with adaptations to take into account specificities of local flooding processes (hydraulic processes around connections like nozzles, spillways, etc.). Further details on the numerical model implementation as well its validation are provided in Rohmer et al. (2018) and references therein. In particular, this site was hit by the storm Xynthia (27-28 February, 2010), which was characterised by a high storm surge ([ 1.5 m at La Rochelle tide gauge) that was in phase with a high spring tide. It led to the inundation of several parts of low-lying areas, thereby causing large economic losses and casualties.
The physical forcing conditions that characterise the offshore meteo-oceanic conditions are summarised by scalar values describing the tide and surge time series, namely the high-tide level T ranging between 0.95 to 3.25 m, the peak amplitude S (ranging between 0.65 to 2.50 m), the phase difference between the surge peak and the high tide t 0 (ranging between -6 and 6 h), the time duration of the rising part t ? (ranging between 0.5 and 12 h) of the surge and the time duration of the falling part t - Fig. 1 Matrix of scatterplots (lower left part) showing the relation between the first three latent outputs denoted z 1-3 derived from PCA applied to the marine flooding case (case 1 described in Section ''Real case 1'' ). The subplots on the diagonal provide the smoothed probability density function. The upper right part provides the values of the pairwise linear Pearson's correlation coefficients (denoted Corr) (ranging between 0.5 and 12 h). These bounds were chosen based on the analysis of the sea level records provided by the tide gauge stations near the study site (see more details in Rohmer et al. 2018: ''Methods'' section). In total, we consider 5 scalar independent input variables describing the offshore conditions x ¼ ðS; T; t 0 ; t þ ; t À Þ.
A set of 500 inputs' configurations were randomly generated via a Sobol' sequence (Sobol' 1967) by assuming that the offshore conditions are uniformly and independently distributed. For each input configuration (i.e. each event), we computed the map of maximum water depths (calculated over the time duration -6/?6 h around high tide). Among the 500 computed maps, a total number of 243 inputs led to flooding (see two examples in Fig. 3, bottom), and these were considered for the training of the metamodels (the design of experiments is provided in Supplementary Materials A). From a risk assessment viewpoint, the prediction of the 257 cases without flooding can be treated by relying on a classification approach as studied by Rohmer et al. (2018) on the same study site. In areas with a low probability of flooding, few cases may produce non-zero water depths, which may hamper the training of the metamodel and its predictive performance. To alleviate this problem (extensively studied by Kyprioti et al. 2022 using data imputation techniques), we restrict the analysis to the region of the site where at least 5% of the total number of model runs have led to non-zero water depth as shown in Fig. 3. This corresponds to an area of 7899 pixels (of 25 m 2 each). Figure 3(top) provides the spatially-varying mean and standard deviation of water depth computed using the set of 243 flood maps. This shows that most of the region is impacted by a mean water depth above 0.25 m (see the blue coloured zone) with moderate-to-high variability (standard deviation [ 0.50 m). The most impacted zone is located in the central urban area with a mean value ranging between 1 and 2 m (outlined in magenta colour in Fig. 3a) and high variability (standard deviation [ 0.5 m).

Real case 2
The second study case is focused on Sainte Suzanne city (of * 22,000 inhabitants) located in Reunion Island along a pebble coast in the North-East part of the island (see location in Fig. 4). During the last century, it has been regularly impacted by cyclone-induced inundations (like Dina in 2002 and Dumile in 2013), notably due to wave overtopping.
To obtain realistic offshore conditions (wave and total water level time series), a database of 500 synthetic cyclones was generated (see full details in Lecacheux et al. (2021)) using 125 historical cyclonic systems (between 1981 and 2016) as follows: (1) for each cyclone track, the position where the system reached the maximum intensity is placed on the center of Reunion Island; (2) each track is shifted by randomly choosing a direction (between 0°and 355°) and a distance from the centre of the island (between 0 and 400 km); (3) For each track, the randomly shift was performed 4 times.
For each randomly-generated synthetic cyclone, the offshore waves and water levels were simulated, and were used as inputs of the modelling chain described in Lecacheux et al. (2021) to compute the distribution over space of the maximum value of the water depth during the cyclonic event. This means that the cyclone characteristics (wind and atmospheric pressure field, track, etc.) are not the input variables used to train the metamodels, but the meteo-oceanic conditions (waves and water levels) that are induced offshore by the cyclones. This choice is justified by the large computational cost of the modelling Among the 500 initial cyclones, a total number of 136 ones generated wave overtopping which led to inundation of the city district close to the seafront (see two examples in Fig. 4, bottom), and these are considered for the training of the metamodels (the design of experiments is provided in Supplementary Materials A). From a risk assessment viewpoint, the prediction of the cases that led to the absence of flooding can be treated by relying on a classification approach as studied by Rohmer et al. (2020) on the same study site.
Based on the previous works for this site (Lecacheux et al. 2021;Rohmer et al. 2018), the set of offshore meteooceanic conditions are summarised using four scalar variables, namely three amplitude variables, i.e. the maximum value of significant wave height (Hs max ), the peak period value at the time instant when the significant wave height is maximum (Tp max ) and the sea water level value at the same time instant (SWL max ), and one variable related to the time duration of the cyclone event, i.e. the time interval (Hs D ) for which the significant wave height exceeds a minimum threshold below which no overtopping can occur (here of 2 m). In total, we consider 4 scalar input variables describing the offshore conditions x ¼ ðHs max ; Hs D ; Tp max ; SWL max Þ.
Similarly as for case 1, we restrict the analysis to the region where at least 5% of the total number of model runs have led to non-zero water depth. This corresponds to an area of 1964 pixels (of 3m 2 each). Figure 5(top) provides the spatially-varying mean and standard deviation computed using the set of 136 flood maps. This shows that most of the region is impacted by a mean water depth above 0.10 m with moderate-to-high variability (standard deviation 0.10-0.25 m). The most impacted zone is located in the south-eastern part with a mean value ranging between

Overall procedure
Let us consider y s j À Á the maximum water depth (maximum value over time during the considered event) at the j th spatial location (j = 1,…,N, with N the total number of spatial locations) defined by its geographical coordinates the N-dimensional vector of maximum water depths related to the i th model run defined by the vector of d offshore conditions A total of n numerical model runs are performed, and the results are compiled in the form of a n 9 N matrix defined as Y ¼ y 1 ; y 2 ; . . .; y n ½ 0 . The workflow of the combined PCA-kriging metamodelling approach, as schematically described in Fig. 6a, holds as follows.
Step 1 aims at reducing the dimension of the n 9 N matrix Y. We rely on PCA (''Dimension reduction via PCA'' Section) to transform Y into a n 9 T matrix Z ¼ the i th T-dimensional vector of latent output variables so that T \ \ N; Step 2 aims at training the kriging metamodel (''Kriging metamodelling'' Section) given the training data D corresponding to the n 9 d matrix of inputs , and the n 9 T matrix of latent outputs Z. At this step, our objective is to explore the added value of capturing the dependence between the latent variables during the fitting process. Different metamodelling options are explored (Table 1); Step 3 aims at predicting the latent output variables given new inputs x Ã and at reconstructing the corresponding flooding map (using Eq. 1 described in Section ''Dimension reduction via PCA'').
The visual examination of Fig. 1 for real case 1 (as well as Fig. 12 for real case 2) suggests that there are distinct subspaces where the dependence structure can be simplified. For instance, the z 1 -z 2 relation depicted in Fig. 1 shows a bilinear trend with a break point at z 1 &10 (a linear regression model provides slopes of 0.93 and -0.31 for the increasing and decreasing part, respectively). Hence, we expect that the dependence structure restricted to each subspace (respectively defined below and above z 1 &10) to be less difficult to approximate. Further physical interpretation of these subspaces are provided in ''Application'' section . Based on these observations, we propose to build upon the method proposed by El Garroussi et al. (2022) by adding a preliminary clustering step (Step 1-2C) combined with a classification step (Step 3C); see in Fig. 6b, greycolored boxes. Further technical details are provided in ''Combination with clustering'' Section.
Finally, a validation protocol for assessing the predictive performance of this approach is described in ''Protocol for performance analysis'' Section.

Dimension reduction via PCA
The objective of PCA is to project Y (with potentially precentering using the spatially-varying mean y calculated over the n model runs at each of the N spatial locations) onto a lower dimension domain by solving the eigenvalue problem of the covariance related to Y i.e., Y 0 Y. By retaining only T largest eigenvalues, Y is approximated as follows: where P is the N 9 T dimension projection matrix made of the T largest eigenvectors, Z ¼ z 1 ; z 2 ; . . .; z n ½ 0 is the n 9 T new matrix of latent variables (also named principal components). The number T can be selected based on the k t wherek j are the eigenvalues: T is then chosen so that I exceeds a given threshold (typically 99%).
To provide physical interpretation of the eigenvectors, we adopt the approach by Campbell et al. (2006) by viewing each of the eigenvectors as a mode of spatial variability that acts as deviations from y. In our case, this allows to identify regions where the mode of spatial variability induces water depth higher/lower than the spatial mean.

Kriging metamodelling
Let us consider the training data D corresponding to the , and the associated n 9 T dataset To account for dependence between the T latent variables z t¼1;...;T , different metamodelling approaches (Section ''Independent approach'' to ''Multi-output approach'') can be adopted (see a summary in Table 1).

Independent approach
In the commonly-used approach (named Ind), to deal with this multi-dimensional output, T independent metamodels are built (one per latent variable). For a given t = 1,…,T, each n-dimensional vector z t ¼ z 1 t ; z 2 t ; . . .; z n t À Á is assumed, in the context of kriging modelling (Williams and Rasmussen 2006), to be a realisation of a Gaussian process GP (Z t (x)) with: -mean (also named trend) l t x ð Þ ¼ P d j¼1 b j g j ðxÞ (where g j are fixed basis functions, and b j are the regression coefficients of the d input variables); -stationary covariance function k t :; : ð Þ (named kernel) written as 8x; For new offshore forcing conditions x Ã , the predictive probability distribution follows a GP with mean b z t x Ã ð Þ and variance s 2 t x Ã ð Þ defined using the universal kriging equations (e.g. Roustant et al. 2012) as follows: C is the n 9 n covariance matrix between the points Z t ðx 1 Þ,…,Z t ðx n Þ whose element is C i; j ½ ¼ k t ðx i ; x j Þ; cðx Ã Þ is the n- Step 1C aims to identify the S clusters using the latent responses as features; Step 2C aims to train a multiclass classifier which is used in Step 3C to predict the cluster index for any new input; the metamodel related to this cluster is then used for prediction (see implementation details in ''Combination with clustering'' section). The boxes outlined by red dashed lines are the steps that remain unmodified when applying the validation protocol 1 (related to the metamodel predictive capability, see ''Protocol for performance analysis'' Section) dimensional vector composed of the covariance between Z t ðx Ã Þ and the points Z t ðx 1 Þ,…,Z t ðx n Þ; gðx Ã Þ is the d-dimensional vector of trend functions values at x Ã , Þ by assuming kð:; :Þ to be stationary (Williams and Rasmussen 2006) with r 2 a hyperparameter (named process variance) to be estimated.

Cokriging approach
When dealing with multiple outputs, one commonlyused approach termed coK is based on autoregressive models as extensively applied in the context of multifidelity co-kriging metamodelling (e.g., Le Gratiet 2013) as follows: ÞÞ. In many application cases, the mathematical relationship q tÀ1 x ð Þ is specified as a constant value (metamodelling approach named coKc in Table 1). However, by construction the latent variables are not linearly correlated (see e.g. Fig. 1), and the coKc is expected to perform poorly. Therefore, we adopt an alternative approach named coKf by assuming a more general basis functions, and b j are the regression coefficients of the d input variables. In the following, we test a coKf with q modelled as a linear function of the inputs x.

Chaining approach
A second class of metamodels relies on a chaining procedure where z tÀ1 is used as an input of the kriging metamodel for the prediction of z t . This implies that T metamodels are built by incorporating z tÀ1 as an extra covariate to predict z t i.e. the new vector of input variables is w ¼ ðx; z tÀ1 Þ, and Z t w ð Þ$GP l t ðw ð Þ; C w Þ where C w is the covariance for the inputs w. Given the strong dependence between z 1 and the subsequent latent variables z t¼2;...;T (see an example in Fig. 1), we propose a simplification of the chaining approach where only z 1 is added as an extra covariate, i.e. the new vector of input variables is w ¼ ðx; z 1 Þ for the construction of all the T-1 kriging metamodels. Chaining Chain A first kriging metamodel is trained to predict z 1 ; z 1 is then considered an extra covariate (in addition to x) of the 2 nd metamodel to predict z 2 . z 2 is then used as an extra covariate for the 3 rd metamodel to predict z 3 , etc ''Chaining approach'' z 1 -dependent z 1 A first kriging metamodel is trained to predict z 1 ; z 1 is then considered an extra covariate (in addition to x) of the kriging metamodel to predict the subsequent z 2 . The same procedure is applied to predict z 3,…,T ''Chaining approach'' Partial dependence

MoP
Dependence is accounted for using the parallel partial kriging model of Gu and Berger (2016) ''Multi-output approach'' Clustering Approach's name indexed by ''C''. For instance coKc-C is a cokriging approach with constant scale factor combined with clustering The dependence structure is simplified by applying the different metamodelling approaches to subspaces of the latent output variables identified via clustering. We focus here on the k-means algorithm for clustering ''Combination with clustering''

Multi-output approach
Finally, a multi-output kriging modelling approach is adopted by building a single kriging model with kernels adapted to vector-valued outputs (Alvarez et al. 2012). These approaches are however known to suffer from a higher computational burden, and to alleviate it, we consider the multi-output metamodelling approach named MoP based on the partial dependence method proposed by Gu and Berger (2016). In this approach, independent kriging models are assumed for each of T latent variables with common estimated correlation parameters of the covariance matrix C and differing trend and variance of the process. Given the training data D, the prediction mean at ÞÞ is calculated as a weighted mean of z t ¼ ðz 1 t ; . . .; z n t Þ t¼1;::;T with the weights w x Ã ð Þ calculated as: where r x Ã ð Þ and R respectively stand for the analogues of c x Ã ð Þ and C in terms of correlation. The reader is referred to Gu and Berger (2016) for further technical details.

Combination with clustering
To further improve the predictability of the combined PCA-kriging metamodelling approach, we propose to build upon the idea proposed by El Garroussi et al. (2022) by combining PCA with clustering: before training the metamodel, a preliminary clustering step is added to identify a limited number of S subspaces of similar ''behavior'' in the space of the latent output variables (see Step 1C in Fig. 6b). Then, one of the metamodelling approaches described in Table 1 is applied to each identified cluster (see Step 2 in Fig. 6b). In the following, we will use a k-means algorithm (Lloyd 1982) to identify the S clusters using the latent responses as features and the Euclidean distance metric.
Depending on the selected metamodelling approach, the combined clustering-PCA-kriging metamodelling approach is defined as the following examples. Let us consider S = 3 clusters and T = 2 latent variables for illustration.
-If the independent approach is selected, i.e. Ind, three independent kriging models are trained by restricting to the subsets of training data defined for each cluster. In total 6 kriging metamodels are then trained. In this case, the approach is named Ind-C; -A similar procedure is applied for the chaining approaches, z 1 or Chain; -If the cokriging approach, coKc or coKf, is selected, a total number of three cokriging models are trained, i.e. one cokriging model per cluster whose role is to link z 1 and z 2 within each cluster. In this case, the approach is named coKc-C or coKf-C depending on the type of model for the scale factor. Note that using a constant scale factor is relevant here because the distinct subspaces are expected to exhibit a simpler, almost linear, dependence structure as suggested in Fig. 1 for the real case 1 (as well as Fig. 12 for the real case 2); -If the multi-output approach is selected, a total number of three MoP models are trained, i.e. one model per cluster. In this case, the approach is named MoP-C.
To predict the value of the latent outputs for a new inputs' configuration x Ã , the clustering procedure requires an additional step aiming at predicting the cluster index of x Ã . This can rely on a multiclass classification model (named classifier in the following); the class to be predicted being the cluster index (see Step 2C in Fig. 6b). To this end, different multiclass classifiers can be proposed (see some examples in Hastie et al. 2009). Once the cluster index of x Ã has been predicted, the metamodel trained for this cluster is used for prediction (see Step 3C in Fig. 6b).
In the following, we propose to take advantage of the bijective nature of the relation of z 1 with the other latent variables that is indicated by the graphical analysis of Fig. 1 and Fig. 12. On this basis, we rely on an approach based on a univariate kriging metamodel that is built to predict z 1 . Using this kriging metamodel, the idea is then to predict z 1 given x Ã , and by taking advantage of the bijective relationship of z 1 with the other latent variables, x* can be assigned to a unique cluster, namely the one for which b z 1 x Ã ð Þ is the closest to the z 1 -coordinate of the cluster centroid (learned from the training dataset). Preliminary tests using more advanced classification techniques, like random forest (Breiman 2001), showed us satisfactory results of the proposed kriging-based approach as well as a lower influence of the classification error (Supplementary Materials C). On the implementation side, an additional practical advantage is to avoid the construction of a classifier and to re-use the kriging metamodel that has already been built for the majority of the metamodelling approaches of Table 1.

Protocol for performance analysis
The objective is to assess the predictive capability of the combined PCA -kriging metamodelling approach i.e. whether the combined PCA-kriging metamodelling approach is capable of predicting flood maps given ''yet-unseen'' input configurations, i.e. samples that have not been used for training. This is examined by using a K-fold cross-validation approach (e.g. Hastie et al. 2009: Sect. 7.10) adapted to our case.
As schematically depicted in Fig. 6, different methods are combined (PCA, kriging metamodel, clustering, and classifier), which implies that the predictive capability of the combined PCA-kriging metamodelling approach depends on different sources of error. Since our primary objective is to assess whether accounting for the dependence structure in the metamodel construction impacts the predictive performance (Question Q1 described in the introduction), we define a first validation protocol (named Protocol 1) that aims at isolating the predictive capability of the metamodels (described in Table 1) alone by keeping the PCA and clustering/classification steps unmodified during the cross-validation procedure (boxes outlined by dashed red lines in Fig. 6). This means that, in practice, the latent variables are learned using the whole dataset, and the identification and the assignment to the clusters are determined using the whole dataset. Full details of the protocol implementation are provided in Appendix A.
In Protocol 1, the performance is analysed in two steps. The first step consists in studying the predictive capacity of the metamodelling approaches for the latent output variables z t¼1;...;T . This step is mainly of interest from a methodological point of view as it allows to study whether taking into account the dependence structure has an impact on the predictive capability of z t in particular those of high order with t close to T. By construction, these latent variables are noisier (the proportion of inertia is lower), thus more difficult to approximate, but are linked to different modes of spatial variability (as discussed in Sect. 4). For a given t = 1,…T, we use the coefficient of determination R 2 t defined as follows: where z t is the average value of the numerically computed values z t with t = 1,…,T, and b z t is the prediction for the t th true latent output. A coefficient R 2 close to 1.0 indicates that the metamodel is successful in matching the new observations that have not been used for the training. A second step investigates the predictive capability of the metamodelling approaches for the reconstructed flood map. We use the mean absolute error MAE(s) to measure the predictive capability at spatial location s averaged over the cross-validation iterations. This performance indicator is defined as: where ae i s ð Þ ¼ y i s ð Þ À b y i s ð Þ is the absolute error with y i s ð Þ the true water depth at location s given the i th input offshore conditions, and b y i s ð Þ the corresponding predictions using the combined PCA -kriging metamodelling approach. We preferably choose this performance indicator (over R 2 for instance) because it is expressed in physical units, here in meters, which makes its interpretation easier for the risk practitioners with respect to the magnitude of the flood-induced water depth.
To assess the cumulative impact of the clustering / classification error, a second validation protocol 2 is defined by using the sub-sampled training dataset during the cross-validation iterations to apply the clustering as well as the training of the classifier (Steps 1C and 2C in Fig. 6b). This means that the clustering and classification are learned with a lower number of training samples (90% of the total number within tenfold cross validation procedure). Finally, to assess the impact of the PCA error (in addition to the clustering / classification errors), a third protocol 3 is defined by applying PCA (and learning the latent variables) using a subset of the training dataset. The two additional validation protocols will be used in Section ''Discussion'' to discuss the robustness of the performance results described in Sect. 4.

Application
In this section, we first apply PCA for each application case described in ''Real case applications'' Section, and analyse the dependence structure of the latent variables as well as the corresponding eigenfunctions. On this basis, we apply the validation protocol 1 (described in Sect. 3.5) using a tenfold cross-validation procedure (repeated 25 times) to evaluate the performance of each of the metamodelling approaches described in Table 1 (i.e. with accounts for the metamodel error only). The validation protocols 2 and 3 (with accounts for the error related to PCA and to the clustering / classification step in addition to the metamodel error) are applied in ''Discussion'' Section to test the robustness of our results. In the following, we assume for all considered kriging models a linear trend, a Matérn 5/2 kernel model and a nugget effect. The hyper-parameters of the kriging models are evaluated through maximum likelihood estimation (e.g. Roustant et al. 2012) except for the multi-output model MoP whose hyper-parameters are evaluated by using the method proposed by Gu and Berger (2016) based on the mode of the marginal posteriors. For the approaches combined with clustering, three clusters are assumed. This choice is further discussed in ''Number of clusters'' Section.

Application of PCA
Using the spatial mean (Fig. 3a) we first centre the data and apply PCA. The analysis of the inertia shows that three latent outputs are necessary to explain * 99% of the variance (with respective contribution of 92, 6 and 1%). The analysis of the PCA reconstruction error shows satisfactory results with a MAE (calculated on the 243 maps) reaching a median value, calculated over space, of 0.04 m (with an inter-quartile range of 0.03 m). Preliminary tests showed that PCA applied to the whole dataset of 500 maps (including those without flooding) led to a larger reconstruction error (median MAE of 0.23 m with an interquartile range of 0.14 m). In addition, the reconstruction error for the maps without flooding indicated a median MAE of 0.08 m (with an inter-quartile range [ 0.08 m). This difficulty of PCA in dealing with zero-valued data is in agreement with the literature (see e.g., Zeng et al. 2022), and supports our choice to restrict the analysis to maps with flooding. The discussion is extended in ''Concluding remarks and further work'' Section. Figure 7 depicts the corresponding eigenvectors over space. On the one hand, Fig. 7a indicates that the first eigenvector has positive values over the whole territory. This means that this first mode of spatial variability induces maximum water depths that are shifted above the spatial mean over the whole domain. The magnitude of the positive deviation varies over space (as marked by the differences in colour brightness in Fig. 7); in particular with a higher intensification at longitude -1.06°and latitude 46.05°at the transition between the urban area and the marshland (see Fig. 2 for the site description).
On the other hand, Fig. 7(b, c) indicate that the second and third components act differently over space with some regions exhibiting positive (negative) changes outlined with warm colours (respectively negative values outlined with cold colours) i.e. leading to water depths shifted above (below) the spatial mean. The second mode induces strong water depths in the central regions where the main urban area (Fig. 2) is located and where the water depths are the highest in average (Fig. 3a), whereas the third mode induces strong water depths mostly in the marshland (south-eastern part of the territory). We also note that the identified regions approximately follow some delineations that correspond the main roads crossing the territory.
As underlined in the introduction, the examination of the joint distribution of the three latent variables z t=1,..,3 shows a clear non-linear structure (Fig. 1). The application of the k-means clustering algorithm with 3 clusters allows to identify subspaces of simplified structures (coloured dots in Fig. 8, left). For instance, each pairwise relation between the latent variables follows approximately a linear trend for the grey cluster. The spatial distributions of water depth (measured with respect to the spatial mean) that are the closest to the k-means-derived centroids are plotted in Fig. 8(right). These cases are representative of the characteristics of the flood maps within each cluster. This shows that Case (c) is related to a flood of wide spatial extension with high water depth well above the spatial mean (indicated by the warm colours): this is in agreement with the high positive z 1 value that induces a high contribution of the first mode of variability (Fig. 7a). Case (b) is related to a flood that is mostly concentrated in the central zone with water depth mostly below the spatial mean (indicated by the cold colours): this is in agreement with z 1-3 values close to zero. Case (a) is related to a flood that has also a wide spatial impact but mostly concentrated in the zones that are related to the second and third mode of spatial variability, i.e. in urban area (Fig. 7b) and in Yves marshland (Fig. 7c), which is related to a moderate value of z 1 and a moderate-to-high value of z 2 and of z 3 .

Performance analysis
We first analyse in Fig. 9 the predictive performance with respect to each latent variable using R 2 as a performance indicator (defined in ''Kriging metamodelling'' Section). Several observations can be made: -All considered metamodelling approaches achieve very satisfactory predictive performance for z 1 with Q 2 [ 90%. The differences lie in the prediction of z 2 and z 3 ; -The classical Ind approach reaches satisfactory predictive performance for z 1 with R 2 [ 90%, but only moderate level of predictive performance for z 2 (with a median value for R 2 not larger than 70%) and even worse for z 3 (with a median value for R 2 not larger than 25%); - Fig. 9(middle-right) underlines the added value of accounting for the dependence structure: all considered metamodelling approaches achieve higher predictive performance by 5-10% in terms of R 2 value at the exception of coKc; -coKc, whose poor performance was expected due to the absence of linear correlation between the latent variables (see Fig. 1), presents high variability in R 2 (outlined by the length of the whiskers of the boxplots) across the repetitions of the validation procedure. Its performance is even worse than Ind as clearly indicated by the boxplot for z 3 ; -Simplifying the dependence structure through clustering in the latent variables' space allows to significantly improve the prediction performance especially for z 3 ; -Two approaches reach particularly high predictive performance (with a median value for R 2 close to 90% and 75% for z 2 and z 3 respectively), namely: Ind-C and MoP-C;  Fig. 9 Boxplots of the R 2 performance indicator (computed by repeating 25 times the tenfold cross-validation procedure) for the prediction of the first three latent outputs, namely z 1 (left), z 2 (middle), and z 3 (right), for real case 1 considering different PCA-kriging metamodelling procedures described in Table 1. The blue, green and red colours indicate the reference solution (Ind approach), the solutions without and with clustering (using 3 clusters) respectively. Note the different scales of the x-axis -Though achieving slightly lower prediction performance for z 2 , we also note the higher performance level of coKf-C procedure for z 3 . This suggests that cokriging has the potential to achieve low prediction error provided that the scale factor captures adequately the dependence structure; -Metamodelling approaches integrating z 1 as a covariate (Chain-C and z 1 -C), despite their simplicity, allows a high predictive performance for z 3 . This performance remains moderate for z 2 .
The performance results are now analysed spatially by considering the second performance indicator MAE over space (defined in ''Kriging metamodelling'' Section). Figure 10 provides the map of MAE (averaged value across the 25 iterations of the cross-validation procedure) considering Ind approach and the three best performing approaches, with or without clustering in the latent variables' space. Figure 10 shows that, without accounting for the dependence structure, Ind approach achieves a MAE \ 0.1 m on * 66% of the total number of pixels. Figure 10 shows that it is mainly the combination with clustering (bottom panels) that leads to the largest MAE decrease of * 0.05 m in the north-eastern part (note the colour change from dark to light green) as well as in the southern part in the Yves marsh (note the removal of the orange patches) although a small MAE increase can be noticed in the central region. The best performing MoP-C approach achieves a MAE \ 0.1 m on * 76% of the total number of pixels, i.e. an increase by * 10% compared to Ind. This is in agreement with the higher predictability for z 3 outlined in Fig. 9 since these zones are more particularly related to the third mode of spatial variability (Fig. 7c).
To get a more quantitative picture of the spatial predictability, we analyse the performance per class of mean water depth (defined by Fig. 3a), namely: \ 0.25 m (corresponding to the pixels mainly located in the north-eastern part, see Fig. 3a, * 34% of the total number of pixels), 0.25 to 1.0 m (* 63% of the total number of pixels), and [ 1.0 m (* 3% of the total number of pixels); although this last class represents the smallest number of pixels, their importance from the point of view of risk management is greater because these pixels are located in the main urban area, i.e. the most vulnerable region of the territory. Figure 11 provides the statistics (across the 25 repetitions of the cross-validation procedure) of MAE averaged over the spatial locations corresponding to the three classes of mean water depth. The performance is analysed both in terms of MAE median values and in terms of MAE dispersion across the repetitions of the cross-validation procedure (indicated by the width of the boxplot endpoints). Several observations can be made: -Apart from coKc, whose poor performance was expected, cokriging-based approaches appear to have the potential to reach low prediction errors (note the lower endpoint of the whisker) for all classes of mean water depth, but the variability across the iterations of the cross-validation procedure remains very high, hence  Table 1). Backgrounds of the maps are extracted from Bing map dataset via OpenStreetMap (https://www.openstreetmap.org/) indicating poor stability of the results. This result is consistent with the analysis of the R 2 boxplots in Fig. 9;  shows that in areas impacted by high water depths, the best performing approaches are MoP-C and Ind-C. This is consistent with the high predictive capability highlighted in Fig. 9 for z 2 , i.e. the mode that induces high water depths in the central regions where the main urban area is located (Fig. 7b); - Fig. 11 (middle and left) confirm the satisfactory performance of MoP-C for smaller water depths, although a slightly poorer (but also less dispersed) performance is highlighted compared to that of the chaining-based approaches; in particular the approach that incorporates z 1 as a new covariate (with or without clustering); -A lower (although still satisfactory compared to Ind) performance is outlined for Ind-C which is here mainly explained by a lower R 2 value for z 3 , i.e. a mode that controls strong water depths mostly in the south-eastern part of the territory (Fig. 7c) where most pixels are impacted by a mean water depth of the order of 0.25-0.50 m (Fig. 3a).

Real case 2
In this section, we investigate the transferability of our results in another context, i.e. for real case 2 characterised by wave overtopping-induced flooding on a smaller urban area than for case 1 (see ''Real case applications'' Section). We apply the same type of analysis than for case 1. For sake of conciseness, we summarise here the main results. Full details are provided in Supplementary Materials B. Figure 12 presents the results of the PCA (with prior pre-centering using the spatial mean depicted in Fig. 5a).
The analysis of the inertia shows that six latent outputs are necessary to explain [ 99% of the variance. Yet, the last three latent outputs contribute only to less than 1% of the variance, and we choose to retain only the three first latent outputs with respective contributions of 87, 7.2 and 3.7%. The analysis of the PCA reconstruction error shows satisfactory results with a MAE (calculated on the 136 maps) reaching a median value, calculated over space, of 0.01 m (with an inter-quartile range \ 0.01 m). Similarly as for case 1, the tests including the whole dataset of 500 maps (including those without flooding) also indicate larger PCA reconstruction error (median MAE of 0.08 m with an interquartile range of 0.07 m). Figure 12 confirms the non-linear dependence structure which shares some similarities with the one for case 1 (Fig. 1). The z 1z 2 relation shows a bilinear trend with a break point at z 1 & -10 (a linear regression model provides slopes of 1.46 and -0.45 for the increasing and decreasing part, respectively). The z 1z 3 relation, though more dissimilar than that of case 1, presents also three distinct zones. Therefore, the visual inspection of Fig. 12 suggests the relevance of using three clusters in the space of the latent variables (the influence of this choice is further discussed in ''Number of clusters'' Section).
We investigate the performance of the different kriging metamodelling approaches (Table 1) by applying the validation protocol 1. Since the analysis of the performance R 2 indicator (Supplementary Materials B: Fig. B3) shows strong similarities with the one for case 1, we only present the analysis of MAE (Fig. 13). This identifies Ind-C and MoP-C as the best performing approaches (the corresponding maps are provided in Supplementary Materials B: Fig. B4) both in terms of MAE median values and in terms of MAE dispersion across the repetitions of the cross-validation procedure (indicated by the width of the boxplot Fig. 11 Boxplots of the MAE performance indicator (computed by repeating 25 times the tenfold cross-validation procedure) averaged over the spatial locations corresponding to three classes of mean water depth, namely \ 0.25 m (left), 0.25-1 m (middle) and [ 1 m (right), for real case 1 considering different PCA-kriging metamodelling procedures described in Table 1. The blue, green and red colours indicate the reference solution (Ind approach), the solutions without and with clustering respectively. Note the different scales of the x-axis endpoints); with a particularly higher performance for high water depths ([ 1 m,Fig. 13,right). In this case, the cokriging-based approaches may improve the predictive capability but the results remain too disperse across the repetitions. Contrary to case 1, Ind-C shows here higher predictive performance.

Added value of accounting for the dependence structure
We have tested different combined PCA -kriging metamodelling approaches (Table 1) to predict flood maps in two different contexts, namely overflow-induced flooding on a relatively large area of several km 2 with heterogeneous land covers (real case 1), and wave overtoppinginduced flooding on a small urban area in a cyclonic context (real case 2). Despite the differences in the flooding drivers as well as in the size of the region impacted by flooding, it is interesting to note the relatively close similarities in the dependence structure of the latent output variables (Figs. 1 and 12), and to some extent in the main modes of spatial variability of the water depth as well ( Fig. 7 and Supplementary Materials B: Fig. B1). This suggests that the physical processes underlying the flooding-induced water propagation within the territory are close.
Regarding question (Q1) described in the introduction, our performance analysis (''Application'' Section) applied to both real cases shows similar results, i.e. taking into account the complexity of the dependence structure in the metamodelling procedure, has the potential to improve the predictive capability of the metamodel. This result is also confirmed by examining an alternative performance indicator that has no physical unit (Supplementary Materials D).
Regarding question (Q2), our tests highlight the added value of combining the approach with clustering. The high predictive performance is interpreted by a double effect: (1) within a given cluster, the kriging metamodels are learned using flood maps that share common similarities as analysed in ''Application of PCA'' Section; the three cases (a-c) in Fig. 7 being the representative flood maps of the three clusters. This first effect contributes, in particular, to the satisfactory performance results of Ind-C; (2) within a given cluster, the dependence structure between the latent variables is simplified, which can more easily be learned. This second effect contributes, in particular, to the satisfactory performance results of the multi-output approach MoP-C. We note, however, that despite this simplification, the combined clustering-cokriging approach still presents poor performance results (as shown by the too high variability in the results across the repetitions of the crossvalidation procedure) even when using a scale factor function of the covariates (coKf approach).
Beyond our application cases, these results suggest that there are some benefits of carefully examining and incorporating the dependence structure in the multiple studies indicated in the introduction that relied on the Ind approach alone. A necessary condition is however to investigate in more detail the robustness of the performance results to our main assumptions. Two aspects are more deeply discussed in the following: the influence of the ''number of clusters'' Section, and the impact of the cumulative error related to PCA and clustering/classification (''Cumulative influence of PCA and clustering/classification error'' Section).

Number of clusters
In ''Application'' section, three clusters were assumed. This choice was primarily dictated by the visual inspection of the latent variables in Fig. 1 and 12. To test the influence of this choice, we apply the validation protocol 1 by varying the number of clusters S from 2 to 5 and we analyse MAE per class of mean water depth for the best performing approaches, Ind-C and MoP-C. On the one hand, Fig. 14(top) shows for case 1 that an optimal number of clusters exists for MoP-C, either 3 or 4, as indicated by the lower MAE values. For mean water depth \ 1 m (Fig. 14, top-right), the optimal number of clusters for Ind-C is closer to 3, but it remains less evident to identify for the other classes of mean water depth.
On the other hand, Fig. 14(bottom-middle and right) show that, for mean water depths [ 0.25 m, the performance results reach a plateau for S C 3: when increasing S, MAE remains approximately at the same level. This indicates that increasing S in these cases do not influence much the results and our choice was here optimal. For mean water depths \ 0.25 m, Fig. 14(bottom-left) shows however a slightly decreasing tendency of MAE statistics when increasing S. Testing larger S values was here not possible because the cross-validation procedure was made challenging due to the small size of some clusters.
This analysis highlights that the predictive performance is not a monotonic function of the number of the clusters. This means that over-simplifying the dependence structure (by means of a large number of clusters) is useless and analysing the trade-off between simplification and stability in the performance results (in close relation to the size of the training dataset) is a necessary preliminary step of the procedure.

Cumulative influence of PCA and clustering/classification error
The performance analysis in ''Application'' Section focuses on the metamodel predictive capability by applying protocol 1 (described in ''Protocol for performance analysis'' Section). To assess the cumulative effect of the error related to the PCA re-construction as well to the clustering / classification step, we apply the two other protocols, 2 and  Table 1. The blue, green and red colours indicate the reference solution (Ind approach), the solutions without and with clustering (with 3 clusters) respectively. Note the different scales of the x-axis 3 (described in Appendix A) to the metamodelling approaches Ind-C and MoP-C. For case 1, Fig. 15 shows that our conclusions can still be considered valid regarding the higher performance of Ind-C and MoP-C, and more specifically for mean water depths B 0.25 m (Fig. 15 top-left), and to some extent for mean water depths [ 1 m as well (Fig. 15 top-right): overall MAE values remain below the ones of Ind (that only accounts for the metamodel error). For this case, the moderate differences between the performance results indicate that the predictive capability of the combined PCA-kriging metamodelling approach is here primarily driven by the kriging metamodel predictive error: accounting for the classification/clustering error alone, or for both sources of error, only moderately impacts the performance.
For case 2, Fig. 15(bottom) shows a different tendency. Due to the additional account of PCA and of classification/clustering error, Ind-C and MoP-C become less competitive compared to Ind at the exception of large mean water depths ([ 1 m, Fig. 15, bottom-right) and to some extent for mean water depths 0.25-1 m as well (Fig. 15, bottom-middle). This lower performance should here be interpreted in close relation to the relatively moderate number of training data. Only 122 training data (i.e. 90% of 136 in a tenfold cross-validation procedure) are used to fit the whole procedure, i.e. metamodel training, learning of the dependence structure, of the latent variables and of the clusters. This effect is less pronounced for case 1 because of the higher number of training data (of 218). Furthermore, the implementation of metamodelling techniques is made more difficult in case 2 due to the higher sensitivity of water depths to small variations in forcing conditions (for instance, MoP-C4 indicates the MoP kriging metamodel combined with k-means using 4 clusters). The blue colour outlines the performance results for Ind-C and MoP-C as described in ''Application'' section. Note the different scales of the x-axis because of the smaller size of the area affected by the flood (Lecacheux et al. 2021).
Finally, as shown in ''Number of clusters'' Section , the performance for case 2 may also be related to the tuning of clustering approach. Additional tests using 2 clusters for case 2 showed however similar results, i.e. a lower performance when the error related to PCA and to classification/clustering is accounted for.

Concluding remarks and further work
To overcome the problem of high-dimensionality of hydrodynamic numerical model's output (e.g. maps of water depth in the context of marine flooding), a commonly-used method is to apply principal component analysis (PCA). In the two real cases of marine flooding, the dimensionality could be respectively reduced from * 8000 (case 1) and * 2000 (case 2) to 3. We show that the predictive performance of the combined PCA-kriging metamodelling approach can be significantly improved (with MAE reduced by several centimeters in both considered real cases) when the dependence structure between the latent variables is accounted for. Our tests show that the most efficient method is when the approach is combined with clustering (as originally proposed by El Garroussi et al. (2022) in the space of the latent variables (here with k-means algorithm). Further expanding the approach by relying on a metamodel specifically dedicated to handle vector-valued variables (here the multi-output kriging metamodel developed by Gu and Berger 2016) allows an additional increase of predictability. The complementary sensitivity tests performed in ''Discussion'' Section highlights however that the higher predictability is achievable provided that two conditions are jointly met: (1) the number of training samples is sufficiently high to learn the dependence structure over the respective clusters; (2) a careful selection of the number of clusters has been performed.
It should however be noted that these results are valid for the two real cases considered here as well as for our assumptions regarding the design of the computer experiments and the training of the kriging metamodels; in particular the performance may be influenced by the kernel Fig. 15 Robustness analysis of the MAE results to the PCA and clustering / classification error for both real cases (respectively top and bottom panels) with respect to each class of mean water depth (left: \ 0.25 m, middle: 0.25-1 m, right: [ 1 m) considering metamodelling approaches Ind-C and MoP-C. The term Err(P,C) indicates a fitting process that accounts for both PCA and clustering / classification errors (in addition to the metamodel error). The term Err(C) indicates a fitting process that accounts for only the clustering / classification errors. The blue and red colours respectively indicate the performance results for the Ind approach, and for Ind-C or MoP-C which only takes into account the metamodel error (as described in ''Application'' Section). Note the different scales of the x-axis choice (see e.g., Stephenson et al. 2022) and by the method for the hyper-parameter estimation (see e.g., Bachoc 2013). Despite these limitations, these results provide a justification for investigating the dependence structure for any new study site to see if an improvement in predictability can be achieved over the widely-used Ind approach.
Our study focuses on one aspect of the problem of metamodelling with high-dimensional output, and several lines for further improvements are identified. First, alternative advanced metamodels are worth investigating, either alternative multi-output kriging formulations (see e.g. Nabati et al. 2022;de Wolff et al. 2021;Hankin et al. 2012;Rougier 2008;López-Lopera et al. 2022) or other types of metamodels like neural networks. In particular deep learning techniques have recently shown high performance in flood management (Bentivoglio et al. 2022), but the question of the feasibility in a context of limited training data (in our case, about 200 data are available for the training) has to be addressed in future studies. Second, among the computed maps, we focused on the cases that led to flooding for metamodel training. A possible extension of this study is to add a classification step at the top of the proposed workflow to identify the forcing conditions that lead to flooding; this is for example studied by Rohmer et al. (2018) and Rohmer et al. (2020) using a random forest classification technique applied on the real case 1 and 2 respectively. Solving this problem is complicated by the complexity of the flood maps, which have discontinuous heterogeneous patterns (especially related to land covers) and non-negative and zero inflation data (many cases lead to the absence of flooding). These complex data cannot be easily processed using linear techniques such as PCA. Complementing the approach with alternative clustering approaches is worth investigating either to overcome the problem of ''dry nodes'' (Kyprioti et al. 2022) or with additional features to improve the predictive performance (Kyprioti et al. 2021). In addition, more advanced techniques, that allow both non-linear dimension reduction methods and an inverse transformation from latent to original space, deserve to be investigated in future research; a promising candidate is autoencoders (Baldi 2012) as studied by El Garroussi et al. (2022) for river flooding.
with i ¼ 1; . . .; n k ð Þ . Steps 3 to 5 are then repeated for the K folds. The afore-described validation protocol 1 is adapted for the approaches combined with clustering as follows: Step 1 is completed by applying clustering on the whole dataset (prior to the cross validation iteration in Step 2) to identify S clusters; Step 3 is modified by training a metamodelling approach to each of the S subset of D ðÀkÞ defined based on the clustering. Since the cluster indices are identified prior to Step 2, no classifier is here needed (contrary to validation protocols 2 and 3 described below).
To assess the cumulative impact of the clustering / classification error, a second validation protocol (named protocol 2) is defined by modifying protocol 1 as follows: -The clustering algorithm is applied on D ðÀkÞ at Step 3 (instead of applying it on the full dataset D at Step 1); -A classifier is learned at Step 3 to assign the k th part of the data to one of the identified cluster.
Finally, to assess the impact of the PCA error (in addition to the clustering / classification errors), a third validation protocol (named protocol 3) is defined by applying PCA (and learning the latent variables) using D ðÀkÞ (instead of using the full dataset D).
Author's contribution JR designed the concept. JR and CS set up the methods. JR undertook the statistical analyses. SL, DI and RP set up the numerical codes and performed the computer experiments. JR, CS, SL, and DI analysed and interpreted the results. JR wrote the manuscript draft. JR, CS, SL, DI, and RP reviewed and edited the manuscript.