Assessing Partial Triadic Analysis with Maximum Entropy Bootstrap: an application to BES Italian education indicators

In this paper we analyse a set of Italian equitable and sustainable well-being indicators, related to the education and training domain, considered at a regional level over diﬀerent years. To this end, the Partial Triadic Analysis – a multiway approach which allows to properly deal with time-dependent data structures – will be used in conjunction with a bootstrap method to carry out the assessment of the outcomes obtained. The adopted resampling scheme – called Maximum Entropy Bootstrap – proved to be an appropriate analytical tool, given its ability to preserve the longitudinal data structure without incurring in the usual restrictive assumptions of more traditional bootstrap methods. To the best of the authors’ knowledge, this is the ﬁrst time a bootstrap method for longitudinal data has been employed in PTA.


Introduction
The Partial Triadic Analysis (PTA) has been proposed at the end of the 1970s Jaffrenou (1978) as a multi-table analysis technique, aimed at efficiently extracting valuable information from complex structures. Derived from the triadic analysis, the PTA -proposed at first in the field of ecology by Thioulouse et al. (1987) and subsequently renamed by Kroonenberg (1989) -follows the same analytical approach as the STATIS method Lavit (1994). In essence, it is a powerful statistical tool fruitfully employed in both theoretical and applied research, characterised by the presence of three-way information-sets which can be represented as a same-structure sequence of data matrices (arrays). In particular, in the case of repeated measurements on the same units, PTA allows the capture of both the structural relationships as well as the dynamics of the observed units with respect to a set of variables.
The PTA, as well as other STATIS-related methods, has been successfully employed in many fields of research, including social and economic sciences (see, among others, Bolasco (1986), Bolasco (1992), Bolasco et al. (1992)).
A careful evaluation of the outcomes generated by this technique has been conducted using a suitable resampling method (illustrated in Section 5). To the best of the authors' knowledge, this is the first time a bootstrap method able to take into account the longitudinal structure of the data has been employed for this purpose. In more details, we will show how bootstrap-based confidence intervals can greatly enhance the amount of valuable information that can be extracted from a longitudinal data set, even if the length of the time series is of moderate size.

Description of the data
On yearly basis, the Italian National of Statistical issues a report focusing on equitable and sustainable well-being (BES) (Istat (2020)). The driving force behind that is to offer an integrated picture of the main economic, social and environmental interactions, with particular emphasis on the territorial aspects. In more details, those published are a set of 130 basic indicators related to 12 domains, i.e.
The structure of the BES indicators tend to evolve according to the growing attention paid, at a European level, to innovative measurement systems and state-of-the-art projects devoted to deepening the relationships between economic policies and i) the objectives of well-being, equity and sustainability as well as ii) the analysis of the determinants leading to a sustainable and inclusive economic growth.
In accordance with the Italian applicable law, the BES indicators must always be taken into consideration whenever a new economic policy, impacting the quality of citizens' life in various ways, is implemented. This fact is consistent with the goal pursued by these indicators, mainly related to the evaluation of the progress achieved by a society as a whole. In fact, they have been specifically designed to capture vital information about the quality of life of a generic citizen in his lifespan. Out of the group of the BES basic indicators, in this paper, we restrict our attention to those belonging to the domain pertaining education and training, considered at NUTS 2 (regional) level, for the period 2008 -2017. In particular, the following indicators will be considered: -Participation in the school system of children aged 4-5 (kinder ): percentage of children, aged 4-5 years, participating in pre-primary education or in primary education on total children aged 4-5 years. For the sake of brevity, in what follows, each of the above listed variables will be referred to using the abbreviation reported in parenthesis.

Basic notation and tools
The dataset considered in this paper can be seen as a three-way array of continuous real variables, say {X ijk ; i = 1, . . . , I, j = 1, . . . , J, k = 1, . . . , K} representing, respectively, the units, the variables and the different occasions to which the data relates. For brevity, with the symbol {X k ; k = 1, . . . , K}, the data matrix observed at time k is denoted. Furthermore, to each occasion, it is associated a triplet (X k , S k , D k ), with D k and S k being positive definite matrices of dimension I × I and J × J, respectively. In particular, these two matrices are associated to the following inner products: where D k is a diagonal matrix of positive values containing the weights of the units (tipically 1 I I) whereas S k (not necessarily diagonal) represents the metric associated to the variables.
Without loss of generality, in what follows, the columns of X k will be centered with respect to the matrix D for an easier mathematical representation, that is: 1 I DX k = 0 J , where 1 I and 0 J are, respectively, the I-vector of ones and the J-vector of zeroes. Furthermore, from now on, we will consider S k = I J since, for each occasion, the size effect has been removed by dividing each column by its standard deviation.
Since we deal with a three-way array, made up of data matrices related to different occasions, the comparison among these matrices can be of particular interest. To this end, we consider the Frobenius matrix product defined on the vector space M I×J (IR) of real matrices, i.e.
where T r is the usual trace operator. Since the Frobenius matrix product is known to possess the scalar product properties, it is always possible to extend, to each of the matrices belonging to M I×J , the variance, covariance and correlation coefficient statistics by denoting them, respectively, as follows: In particular, it holds that RV ∈ [0, 1], taking the value 1 if and only if A = γB, for some scalar γ, with γ = 0. It is also noted that the RV coefficient can be seen as a scalar product between matrices scaled with respect to their Frobenius norms (represented by the denominator of RHS in Eqn. 6).
Finally, we observe that the Frobenius matrix product can also be expressed as with vec() being the so called matrix stack operator, which -by stacking all the columns vectors of the matrix each one below one another -creates a single column vector.

Partial Triadic Analysis
The PTA (Kroonenberg (1989), Thioulouse et al. (1987)) is a statistical method specifically designed to perform the joint analysis of multiple sets of data having the same structure with respect to all the occasions. This approach is similar to STATIS method (Lavit (1994)) however, in this case, the method is applied directly to the elementary matrices X k ∈ M I×J (IR).
The PTA relies on the properties of the Euclidean spaces and is carried out in three different steps, as below detailed:

Interstructure
The Interstructure step is aimed at analyzing the data globally and, to this end, captures the similarity among the data structures related to the different occasions. Formally, the Interstructure matrix is based on the Frobenius matrix product, computed as: or, alternatively, as: It is worth emphasizing that the generic element (k, k ) represents either the vectorial covariance (Eqn. 8) or the Escoufier RV's correlation coefficient (Eqn. 9) between the two occasions k and k . Since, by using the latter, the assessment of the degree of similarity among the available occasions -controlling for the size effect -is allowed, we will choose them for the analysis.
Once the matrix C is obtained, the analysis of the similarities among the occasions is based on the data projection onto the factorial space induced by its first eigenvectors.

Compromise
The Compromise step is designed to define a common representation of the data through a factorial space with respect to the whole period considered. This representation is based on the matrix obtained by the following weighted sum: where the vector α = (α 1 , . . . , α K ) is the first eigenvector of the Interstructure matrix C. It can be shown (Lavit (1994)) that the matrix X * represents the best approximation of the matrices {X k ; k = 1, . . . , K} with respect to the norm induced by the Frobenius matrix product.

Intrastructure
In this step, the specific information contained in the occasion matrices {X k ; k = 1, . . . , K} is analysed by decomposing the squared distances, computed using the matrix C, through the following equation: D 2 being defined as the matrix of the square distances. By using D 2 with Eqn. 9, the contributions related to each unit are computed allowing a more insightful understanding of the changes occurring in the indicators and the detection of the more influencing units in the data set (Lavit (1994)).

The Resampling Method
Proposed in Vinod (2006) and subsequently improved in Vinod et al. (2009) and Vinod (2016), the Maximum Entropy Bootstrap (MEB) is the resampling scheme adopted here. At its core, MEB is a fully non-parametric method relying on a resampling mechanism which is different from other more traditinal solutions. In fact, what is generally done in standard procedures. In fact, while in the classic bootstrap an ensemble Ω represents the population of reference the observed time series is drawn from, in MEB a large number of ensembles, say {ω 1 , . . . , ω N }, becomes the elements belonging to Ω, each of them containing a number B of replicates {x 1 , . . . , x B }. In addition, M EB is characterised by the following three important features: 1. it is distribution-free 2. it is able to handle any type of persistence 3. it can generate pseudo-replications satisfying both the ergodic and the central limit theorems Let H s be the Shannon entropy, defined as being G(·) the probability density function, assigning a level of probabilities to each of the statuses a given system can be found in, and E(·) the expectation function. The application of Eqn. 12 implies G(x) to be the argument of a standard optimization problem of the type which is solved within the maximum entropy framework by introducing a set of additional constraints, imposed on both the probability mass function and the probabilistic structure of each of the bootstrap replications, as explained below.
Let {X t } T 1 be the observed time series x 1 , x 2 , . . . , x T and denote by x (1) , x (2) , . . . , x (T ) its order statistics: it holds that The set of midpoints, denoted by z 1 , z 2 , . . . , z T , is computed as As for the above mentioned constraints, two of them are imposed on the density function G(x) to guarantee the whole information set to be correctly accounted for. Formally, they can be expressed as where m t = EG(B t ). In particular, the last constraint is designed to preserve the probabilistic structure of the system and it is fulfilled by imposing the perfect rank correlation between the original series and its bootstrap replications.

MEB algorithm
The MEB algorithm generates a single bootstrap sample according to an algorithm which will be now broken down in a step-by-step fashion: 1. the order statistics for the observed data x t are generated according to (14), being A sorting matrix of dimension T × 2, say W 1 , accommodates in the first column the time series x t and an index set -denoted by I ind = {1, 2, . . . , T } -in the other one; 3. W 1 is sorted according to the first column and the order statistics x (t) is thus defined. We then compute the midpoints z t and the half-open intervals B t ; 4. T pseudo-random numbers are drawn from a uniform distribution, p s ∼ U [0, 1]; s ∈ {1, . . . , T } and a range of values R t = t T , t+1 T ∀t ∈ {0, 1, . . . , T − 1} wherein each p s falls, is defined; 5. both R t and I t are matched and a new setx * t is drawn using the density function G(x); 6. a new sorting matrix W 2 , exhibiting the same structure of W 1 , is defined.
It serves the purpose of sorting the T-dimension vectorx * t in increasing order and thus it defines the ordering statisticx * (t) ; 7. the order statistics x (t) in W 1 is replaced byx * (t) in W 2 , generated in Step 6. Then,x * (t) is sorted according to the first column of W 1 and x * t is recovered. The vector x * t stores a single bootstrap replication of the original series x t .

Data bootstrapping
In this section, the resampling scheme introduced in Section 5 will be used to generate the bootstrapped data. Before that, the main issue related to the choice made of the number B of the bootstrapped series, is addressed. In practice, the final decision has been made using a trial and error approach, where three different bootstrap sample sizes B -i.e. {B = 75, 125, 175} -have been tested. The number of pseudoseries has finally been set to B = 75, according to the best variance-bias trade-off criterion.
The bootstrap exercise has been carried out considering separately each and every time series, without attempting any explicit inclusion of the correlation structures present in the original data. The cross-variables connection accounted for is related to the bootstrap time rank, which has been kept constant for each run. Such a choice is justified by the limited sample size available, hardly suitable for multivariate approaches, and corroborated by the literature where several multivariate applications of the MEB algorithm, based on just one variable at time, can be found. For example, Srivastav et al. (2017) proposed a multivariate bootstrap-based scheme aimed at capturing the multicollinear dependency structure embedded in the investigated data set, related to weather variables. Their approach envisions the employment of the MEB algorithm in conjunction with an orthogonal linear transformation. Lin Shang (2017) faces the problem of the online prediction of high frequency ecological data -as well as of the related forecasting intervals -under a high dimensional data context. In essence, the proposed forecasting method decomposes a functional time series into a number of functional principal components and their associated scores. In such a set up, which clearly poses challenges from a statistical point of view (summarised by the concept known as curse of dimensionality), the author successfully employed the MEB scheme to construct bona fide replications of the original time series. The performances recorded on real-life data can be defined remarkable. Finally, Vinod et al. (2009), discuss a Keynesian consumption function, using the Friedman's permanent income hypothesis, applying MEB scheme separately to each of the considered time series.

Data analysis
In this section we apply the PTA to the three-way array of the BES indicators, described in Section 2.
We first carry out the Interstructure step, by computing the matrix C of RV coefficients between pairs of occasions (see Eqn. 9). By inspecting Table 4, where the related results are reported, the high degrees of correlation between the occasions, always greater than 0.79, are noticeable.  The factorial representation calculated from the matrix C is reported in Figure 1, where the first two axes account for 94.6% of the total inertia (the first axis explains 89.8%). It is worth mentioning that the data points relative to the different years are not too scatterd and, in fact, exhibit an approximatively regular pattern. In order to assess the goodness of the eigendecomposition of the matrix C, B=75 MEB replicates of the whole data array have been generated. For each bootstrap replication, the matrix storing the RV coefficients, say C b , as well as the related eigendecomposition, are computed. As a result of that, the B bootstrap distributions related to both the eigenvalues and the eigenvectors are obtained.
The outcomes relative to the bootstrap distributions of the first two eigenvalues are represented by the 0.05-0.95% quantile-based boxplots, as reported in Figure 2. From its inspection, it can be noticed that both the first and the second eigenvalue of the matrix C lie within their respective 5-95% bootstrap interval. The remaining eigenvalues are left out of the analysis since they account for a negligible amount of inertia.
By applying the same procedure to the first two eigenvectors, the 0.05-0.95% quantile-based boxplots of the first eigenvector components (one for each year) -reported in Figure 3 -are generated. Also in this case, all the first eigenvector components of the matrix C lie within their respective 5-95% Both these results seem to confirm that the representation obtained for the Interstructure step can be considered reliable. Recalling Eqn.10, we can therefore use these weights for the construction of the Compromise matrix H * .
In order to assess the quality of the projection of each matrix {X k ; k = 1, . . . , K} on the Compromise space, we use Figure 4, where the typological values of the matrices X k are depicted. Here, the weights α k (Eqn.10) and the square of the cosinus of the projection on the Compromise space are reported, respectively, on the x and y axes. Both the high quality representation of these matrices (the squared cosinus is always greater than 0.91) and the fact that the estimated weights are approximately similar to each other, are necessary conditions to define a reliable Compromise space.  Figures 5 and 6 show, respectively, the scree-plot and the biplot of the Compromise matrix X * , which represents the "common structure" of the matrices {X k ; k = 1, . . . , K} (Bolasco (1999)). It is noted that the first two axes account for a percentage of the total inertia equal to 88.1%. Tables 2 and 3 report the contributions of the units to the principal components and their quality of representation, respectively. By inspecting Figure 6, furthermore, we observe that the first axis is positively correlated with the variable neet, which represents the percentage of people not in education, employment or training, and negatively correlated to the variables representing secondary and tertiary education (hi sc and degree) and, even though to a lesser extent, to the percentage of people aged 25-64 participating in formal or non-formal education (cont tr ). Based on these associations, it can be said that this axis account for the propensity in achieving higher level of education, even if not formally recognised.   On the other hand, the second axis of the biplot is mainly associated to only one variable, i.e. the one accounting for the percentage of children aged 4-5 participating in pre-primary or primary education (kinder ). Considering the second axis, it is noted that Marche, Aosta Valley, Trentino-South Tyrol and Tuscany exhibit low degrees of propensity towards sending their children to the pre-primary or primary schools. On the other hand, an opposite behaviour is showed by some regions, e.g. Sicily, Friuli-Venezia Giulia, Marche and Lazio.
As for the Intrastructure step, the matrix D 2 representing the squared distances among the occasions (years) and thus the differences expressed at the whole regional system level, is computed (see Eqn. 11) and reported in table 4. In order to define a clear and informative set of comparisons between the years, only the squared distances between the first year of the time span (with 2008 the reference year) and, one at a time, all the following years, are considered. The percentage contributions of each region to the squared distance -computed through Eqn. 11 and Eqn. 6 -are reported in Table 5. According to Lavit (1994), the rows of the Table 5 can be interpreted as "trajectories". In fact, they account for the relative contribution given by each region to the actual change recorded at the whole regional system in the considered time span.
Some of the regions seem to give a relevant contribution to the squared distance. In particular, Piedmont, Friuli-Venezia Giulia, Umbria and Campania, which show an average contribution greater than 6% (and greater than 10% in at least two occasions). It is worth outlying how such regions belong to all the macro-regions Italy is usually broken into, i.e. North-West (Piedmont), North-East (Friuli-Venezia Giulia), Centre (Umbria) and South (Campania).
As already mentioned, the reliability of the observed trajectories has been assessed through their bootstrap distributions. In practice, these distributions  Fig. 7 Trajectories of the most influential regions have been obtained by computing (see Eqn. 11 and 6) the relative contribution given by each region to the squared distance for each of the bootstrapped series, say D 2 b ; b = 1, . . . , B . Figure 7 reports the medians and the 5-95% bootstrap intervals related to the four above mentioned regions (the continuous line represents the original sample values, the dashed line the bootstrap median values). Also in this case, the data lie inside the bootstrap confidence limits, fact that confirms the reliability of the observed trajectories .

Final remarks
In this paper we proposed a bootstrap-assisted method for the analysis of the BES indicators through the PTA. Its application to a set of basic indicators time series -relative to the domain of education and training -proved its usefulness in capturing both the structural relationships as well as the dynamics generated at a regional level.
In more details, the three-stage PTA technique has made possible the global analysis of the whole set of indicators for each year, so that its dynamic has been investigated over the available timespan. Furthermore, the definition of a suitable compromise space allowed for a single factorial representation of the Italian regions, taking into account the information related to each of the considered years. Finally, the dynamics characterising the most inuential regions, i.e. those more responsible for the changes recorded at the whole regional system, have been represented.
In addition, an assessment procedure for the PTA has been proposed on the basis of a suitable resampling scheme and applied to each step of the analysis. In particular, given the time-dependent nature of the data, we made use of the MEB algorithm which, as it is well known, is designed to perform satisfactorily under milder assumption than more traditional schemes. The use of the assessment procedure in each step of the the analysis is particularly important given the sequential nature of the PTA approach.
To do so, an adequate number of replicates of the observed data array has been artificially generated through the MEB algorithm, so that the corresponding matrices of RV coefficients have been computed. Using these distributions, we were finally able to carry out a bootstrap-based assesment of the key elements of the analysis (i.e. eigenvalues and eigenvectors of the RV matrix and the squared distances among the occasions).
Our findings point towards the MEB algorithm as a robust resampling tool able to deliver satisfactory performances in our particular context, where the number of occasions (i.e. the length of the time series) is of limited size. Furthermore, to the best of the authors' knowledge, this is the first time a bootstrap method for longitudinal data has been employed in PTA.
Future directions of the present research include the estimate of the minimum sample size of the data arrays for our procedure to be valid.

Disclaimer
The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of their institution.

Declarations
Funding -The authors did not receive support from any organization for the submitted work. -No funding was received to assist with the preparation of this manuscript.
-No funding was received for conducting this study.
-No funds, grants, or other support was received.