On the use of self-organizing maps in detecting teleconnections. Part II: temporal aspects

Self-organizing maps (SOMs) represent a method widely used to obtain representative states of atmospheric circulation and link these states to local climate on one hand, and large-scale modes of circulation variability on the other. In the present Part II of our study, we focus on the temporal aspects of the link between SOM circulation types (CTs) and main building blocks of spatial-temporal variability represented by (i) three predened idealized oscillatory variability modes and (ii) four leading modes of Euro-Atlantic circulation variability extracted by principal component analysis (PCA). The CTs respond to various changes in modes, including variations in their strength and preferred phase. However, compared to these responses, the detected sampling variability inherent to decadal-scale datasets generated from identical setting of modes is surprisingly high, showing that trends in the frequency of CTs of approximately ± 30% can occur without any change in the strength and phase of the underlying modes. This suggests that in order to achieve robust changes in CT frequencies, either an unrealistically large change in the underlying variability mode, which is inconsistent with reanalysis data, is required, or simultaneous contributions of two or more modes that superimpose one another are needed. Consequently, to attribute changes in CTs to variability modes, other methods (e.g., PCA) should be used in parallel to SOMs to avoid misinterpretations. Since the results obtained for the idealized modes agree with our ndings for real-world circulation, we believe that the rather simplistic idealized modes may be used in future studies that would extend the research to non-linear aspects of teleconnectivity, including (but not limited to) non-stationary spatial patterns and non-linear combination of variability modes in generating synthetic datasets.


Introduction
The large-scale atmospheric circulation, its natural variability and changes under climate change have been extensively studied due to their close links to surface atmospheric elements, especially in midlatitudes. Many methods have been used to simplify the wide spatial-temporal variations of atmospheric circulation, the approaches of circulation classi cations and decomposition into modes of circulation variability being among the favorite ones.
The circulation classi cations consist in de ning a relatively small number of representative circulation types (CTs) or recurrent circulation regimes, and classifying elds of a circulation variable with these types/regimes according to a measure of similarity, which is usually pattern-to-pattern correlation or Euclidean distance. There are countless algorithms for obtaining a classi cation, an overview of which can be found in Huth et al. (2008). This study focuses on one of these algorithms, viz. the self-organizing maps (SOMs).
The SOMs (Kohonen 2001;Hewitson and Crane 2002;Sheridan and Lee 2011) represent one of the most popular methods of studying atmospheric circulation. Methodologically, they constitute a neural network based approach that aims to nd a set of states across the circulation data space that together reasonably approximate all recorded circulation elds. Relative to other wide spread classi cation methods, such as the Jenkinson-Collison method or various algorithms of cluster analysis, the SOMs are not only able to nd CTs across the whole spectrum of circulation elds but also sort the CTs into atypically two-dimensional-array or "map" that preserves the topological structure of the data space.
Therefore, close (in terms of Euclidean distance; ED) CTs lie in the array relatively closer to each other while CTs with strong anomalies of opposite sign tend to be "pushed away" from each other into opposite corners or edges of the array (e.g., Leloup et al. 2007;Reusch et al. 2007;Reusch 2010).
Although studies comparing SOMs with other classi cation methods (Fleig et al. 2010;Huth 2010;Tveito 2010;Lykoudis et al. 2010;Broderick and Fealy 2015;Trigo et al. 2016;Palm et al. 2017) clearly showed that different methods turn out to be optimum depending on the application, it has been the SOM that sparked a new interest in synoptic climatology owing to its novel point of view on circulation variability (Sheridan and Lee 2011). This interest resulted in a wide range of studies on circulation in outputs of reanalyses and climate models, links between circulation and various atmospheric and environmental phenomena, as well as links between synoptic-scale CTs to large-scale teleconnections, bringing new insights into circulation over Asia (Liu et al. 2016;Gao et al. 2019;Ohba and Sugimoto 2019), Polar regions (e.g., Schuenemann et al. 2009;Bezeau et al. 2015;Yu et al. 2018), Australia and New Zealand (e.g., Jiang et al. 2013;Huva et al. 2015;Gibson et al. 2016a,b;Harrington et al. 2016; The above mentioned methods of identifying teleconnections have been disputed by some researchers pointing to their limitations including spatially xed structure, spatially symmetric opposite phases, normality, linearity and/or orthogonality imposed on data, and possible extraction of statistical artifacts rather than physical modes (e.g., Monahan et al. 2001;Chen and Van den Dool 2003;Cassou et al. 2004;Sheridan and Lee 2012;Hunt et al. 2013). Instead, Cassou et al. (2004) applies cluster analysis to lowpass ltered North Atlantic-European sea level pressure (SLP) elds produced by National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis to obtain four signi cant climate regimes, the rst two of which capture the negative and positive phases of the North Atlantic Oscillation (NAO; Hurrell et al. 2001), providing a non-linear and non-symmetrical perspective of the leading teleconnection in the region. To extend this continuum perspective of teleconnections, Johnson et al. (2008) apply SOMs to Northern Hemisphere SLP data. Leading teleconnections are then represented by one or more of the CTs in a SOM array and changes in the teleconnections are assessed by evaluating variations in the occurrence of the respective CTs. Various teleconnections have been analyzed utilizing this alternative perspective, including El Niño-Southern Oscillation (e.g., Leloup et al. 2007;Johnson 2013;Ashok et al. 2017), Madden-Julian oscillation (Chattopadhyay et al. 2013), the Indian Ocean dipole (Tozuka et al. 2008;Verdon-Kidd 2018), the Arctic oscillation (Dai and Tan 2017), as well as North Paci c (Johnson and Feldstein 2010;Yuan et al. 2015) and North Atlantic (Johnson et al. 2008;Rousi et al. 2015Rousi et al. , 2020 teleconnections. The two perspectives on teleconnections relying on different decompositions of spatial-temporal variability are distinct and demand speci c interpretation, but also closely linked together and complementing each other, and applying them in parallel provides potential for new ndings. However, only few studies have compared the approaches from the methodological point of view (Reusch et al. 2005(Reusch et al. , 2007Liu et al. 2006;Rousi et al. 2015). In Part I of the study, we analyzed how prede ned idealized modes of variability project on SOM arrays, and whether the two-dimensional SOMs are able to capture other modes than the leading two modes of variability, as far as the spatial patterns are concerned. In the present Part II, we focus on the temporal aspects of the link between modes of variability and SOM CTs, namely on how CT frequency of occurrence depends on data sampling, strength of modes and its changes, and on the phase of modes. Furthermore, to assess the validity of our ndings for real atmospheric data, results obtained for idealized and real-world variability modes are compared. The paper is organized as follows: In Sect. 2, we describe the methodology of generating and analyzing the synthetic datasets and training of SOMs, in Sect. 3.1 we show results for idealized variability modes, in Sect. 3.2 we focus on Euro-Atlantic variability modes, and in Sect. 4 we provide discussion and conclusions.

Data And Methods
In the paper, two kinds of synthetic data sets are used to test the projection of variability modes on SOMs, the rst constructed using three idealized modes, which are identical to Part I, the other based on modes calculated by SPCA of daily mean winter 1948-2016 NCEP-NCAR Euro-Atlantic 500 hPa geopotential height (GPH) elds.

Synthetic data generated from idealized modes of variability
The idealized modes express variability in zonal (Z), meridional (M), and circular (C) ow; spatial patterns of positive phases of the modes (denoted as Z+, M+, C+) are shown in Fig. 1. The modes are de ned as orthogonal so that they mimic modes identi ed by SPCA. Using the modes, 100 datasets were generated, each dataset comprising 1,000 circulation elds that consist of 20×20 grid points, each eld being a linear combination of the three modes, where each mode is multiplied by a score. The scores are drawn from normal distributions, the mean of which is equal to zero and variance equal to the prede ned explained variance (EV) of the particular mode. Note that this process is equivalent to reconstruction of data from principal components (PCs) obtained by SPCA decomposition, except for randomly generating the weights instead of obtaining them from the time series of PC scores. One particular combination of EVs of modes was selected from the variants tested in Part I-"Z50-M30-C20" (each number representing the EV of the given mode), as this choice gives ample opportunity for modifying the EVs of each mode in either direction. Additionally, it ensures that a wide range of elds is generated, which in turn means that all modes are represented among CTs. In the next step, more datasets were generated in which the EV of the leading mode gradually increases (decreases) at a step of 1 percentage point while variance is proportionally subtracted from (added to) the remaining two modes. Additionally, several datasets were generated in which only the scores were altered while EVs of modes remained xed.

Synthetic data generated from Euro-Atlantic modes of variability
The North-Atlantic variability modes were obtained by SPCA decomposition of winter (DJF) 1948-2016 daily mean 500 hPa GPH anomaly elds from the output of the NCEP/NCAR reanalysis (Kalnay et al. 1996). The normalization was carried out separately for each calendar day by subtracting the respective 68-year average, for a spatial domain of 20×20 grid points covering the area of 22.5-70N and 30W-17.5E (at 2.5° horizontal step). Furthermore, the value at each grid point was weighted by the area that the grid point represents in order to account for an uneven spatial sampling density. Subsequently, the covariance matrix was calculated from the data and decomposed by SPCA. The leading four PCs, which together explain approximately 77% of the total variability, were retained. Rather than analyzing the output directly, the PCs were rst rotated using the Varimax orthogonal rotation (Richman 1986). Rotation of PCs is recommended in order to remove statistical artifacts that may occur among the modes (Huth and Beranová 2021). In Fig. 2 the spatial patterns and EVs of the four variability modes are shown.
The obtained modes were then used to generate quasi-realistic daily anomaly patterns in the same manner as for idealized modes, that is, using the (normalized) rotated loadings as spatial patterns of modes and variance of the rotated PCs as a parameter for generating time series of scores. First, a total of 100 data sets (each comprising 1,000 daily patterns) were generated, each having loadings and EVs of modes xed to the values we had obtained for rotated PCs, but relaxed time series of scores (see Fig. 3).
To properly account for the variability explained by non-retained PCs, each daily pattern was further superimposed by a noise pattern generated randomly such that its spatial autocorrelation approximates that of the reanalysis anomaly data after the information explained by the four retained PCs was removed and the total variance explained by the noise patterns in each grid point approximates the combined variance of non-retained PCs. For simplicity, spatial variations in the autocorrelation were ignored-its values between neighboring points were set to 0.9. The data were generated serially uncorrelated, since temporal autocorrelation would not have any effect on any of our intended analyses.
Second, datasets were generated in which the EV of the rst mode gradually increases (decreases) at a step of 0.9 percentage points from 20.4% (its strength in NCEP/NCAR) while variance is subtracted from (added to) the remaining three modes at a step proportional to their EV in the reanalysis. Therefore, the EV of the rst mode across the 85 generated datasets ranges from 0.6-76.2%. The variance explained by noise is identical in all kinds of datasets generated from real-world modes (approximately 23%). Third, several datasets were generated in which only the distribution of scores was altered (shifted toward positive scores), while EVs of modes remained xed.
Last, several additional datasets were de ned for both idealized and real-world modes, including datasets that comprise 10,000 instead of 1,000 elds. The motivation behind these auxiliary datasets and their exact speci cation is deferred to later in the text.

Training of SOMs and classi cation of data
In the next step, all datasets were classi ed by SOMs, using the "Kohonen" R package (Wehrens and Buydens 2007;Wehrens and Kruisselbrink 2018). The output of a SOM classi cation depends on multiple parameters, including the way how the iterative process is initialized and what is the size (i.e. number of CTs) and shape (i.e. number of rows and columns of the two-dimensional array of CTs) of the particular SOM. We considered several sizes of SOM arrays, ranging from 2×3 (rows × columns) to 6×7. Utilizing results of Part I and additional preliminary analyses, the 4×5 SOM array (20 CTs) was chosen for this study, since smaller arrays did not include enough types resembling lower-order modes, while larger arrays tended to have bad topology and/or comprise too many too similar patterns.
The methodology is indicated in Fig. 3 for the Euro-Atlantic modes of variability: First, the reanalysis data were used to train 200 SOMs that differ only in their random initialization. Second, from these SOMs, one was selected by evaluating their quantization and topological errors. Third, both reanalysis data and all generated datasets were projected onto the SOM, by classifying each circulation eld with the closest (in terms of Euclidean distance) CT, and calculating the frequency of occurrence of each CT in each dataset.
To analyse the links between modes and types, the types were further meta-classi ed by calculating the correlation between their pattern and the patterns of modes. A type represents the positive (negative) phase of a mode if the pattern correlation is stronger than a positive (negative) threshold value. The choice of such threshold is to some extent arbitrary, nevertheless, it needs to re ect the structure of data (the fewer the modes, the higher the correlations) and parameters of the classi cation (the fewer the types, the lower the correlations). In general, the value should be higher than 0.7 (R 2 ≈ 50%), in order that one mode contributes to most of its spatial variability. On the other hand, too high a threshold could lead to some modes (especially, the lower-order ones) being unrepresented. For idealized modes, the value 0.8 seemed to be a good compromise between the two requirements, although it had to be relaxed to 0.75 for the third-order mode. In the case of real-world modes, which led to more complex data, a lower threshold (0.7) was chosen.
The idealized modes were analysed in a nearly identical way, except for prede ning the modes instead of extracting them from the reanalysis data and training the SOMs in one of the generated datasets instead of in the reanalysis data.
Since the processes of quality evaluation of SOMs and selection of other important SOM parameters were described and discussed in detail in Part I, we refer an interested reader there.

Idealized variability modes
In this section, we describe results for datasets generated from three idealized variability modes. First, the sampling variability of the frequency of CTs is analyzed; second, the sensitivity of CT frequency to changes in EV of modes is analyzed; and third, the sensitivity of CT frequency to changes in the phase of modes is assessed. All results are obtained by projecting generated datasets on a 4×5 SOMs, shown in Fig. 4, which was trained on one of the 100 Z50-M30-C20 datasets. In the gure, CTs with spatial pattern similar to one of the modes are highlighted (Fig. 1): eight CTs (clustered in two opposite corners) resemble the zonal mode, and six (four) CTs closely resemble the meridional (circular) mode. In Part I, we showed that the third-order mode projects on the 2D SOM arrays rather weakly; here, this is also apparent from the overall lower pattern correlations for the circular mode. In order that both phases of the mode are represented, the threshold correlation is set lower (0.75) than for the leading two modes (0.8).
The sensitivity of CT frequencies to random generation of scores of modes (without changing the ratio of EVs) is rst analyzed for datasets comprising 1,000 elds ( Fig. 5a; red dots). This size of datasets approximates the length of a time series of daily elds for one 3-month season and decade (≈ 900 days). The results should, therefore, be indicative of the rate of sampling uncertainty one needs to consider when interpreting decadal-scale changes in atmospheric circulation. For this size of datasets, the sampling variability seems to be rather large: while the relative frequency of CTs varies between about 2% and 8%, an average sampling variability (expressed as the range of frequencies) is about 3.5 percentage points. In other words, if one considers a CT occurring 50 times in the datasets on average, such a CT occurs in individual datasets in about 33-66 instances.
Since the "average" (or, population) frequency of CTs is in reality unknown, a more useful way to express the variability may consist in looking at dataset-to-dataset differences in the frequency of individual CTs, drawing pairs of datasets randomly from the pool of 100 datasets. In Fig. 6a, the boxplots show the rate of these differences as well as how the differences depend on the relative frequency of CTs. The change in rare CTs (relative frequency under 2%) appears to be overwhelmingly positive and the interval containing the 90% of values between the 5th and 95th percentile (hereafter, "90% interval") is very wide (approx. -20-170%). On the other hand, the interval is narrowest and most of the differences negative for the most common CTs (approx. -30-20%). For the two categories between 4-6% (highlighted in blue in Fig. 6a), which are representative of most CTs due to the tendency of the classi cation to produce equally populated clusters, the 90% interval is bounded by approximately ± 30%.
The bias toward positive (negative) trends in small (large) CTs is due to how CT frequency is expressed; in this case, it does not refer to the population frequency but rather to the frequency in one of the two randomly generated datasets ("Dataset 1", or "reference"). The result suggests that when interpreting differences between classi cations, a tendency to positive (negative) trend in CT frequencies of small (large) CTs may be linked to sampling variability rather than real changes in circulation, e.g., variability modes. In general, both perspectives of sampling uncertainty indicate that in the case of a decrease or an increase in CT frequency, especially that which is smaller than about 30%, one should not neglect the effect of sampling on results. In Fig. 5b, we show (for the rst row of CTs only) the same result as in Fig. 5a, except for larger datasets (n = 10,000). Such an increase in size leads to a marked decrease in sampling variability in CT frequencies by approximately 70%.
The sensitivity of CT frequencies to gradual changes in the strength of modes is tested in several experiments. First, we analyse datasets in which the strength of all three modes changes simultaneously (Fig. 5a, black dots). For a majority of CTs, changes in EV of modes lead to apparent trends in the frequency of their occurrence, but not for all of them. For example, from the eight "zonal" CTs, #2 and #15 respond only very weakly even if the mode explains nearly all variability. CT #13, which is most similar to the pattern of the third mode (C), responds very weakly to changes in EV of that mode, and stays quite frequent even if the mode is nearly turned off. This lack of response of #13 is replicated in another experiment (Fig. 5c), in which the EV of Z is xed and EV is gradually subtracted from C and added to M.
In this case, a drop in EV of C by 67% leads to a rather small 25% decrease in frequency. Last but not least, when EV of Z is xed, most zonal CTs seemed to have a trend-albeit weak-due to changes in other modes.
Marked changes in frequency of CTs are apparent only for rather extreme changes in EVs, which are unlikely to occur in reality, at least for leading modes, as will be documented later for reanalysis data. To generalize the results, and to assess the rate of changes in CT frequency to rather strong, but feasible, changes in EV of modes, we analyze the change in CT frequency equivalent to strengthening of modes by 20%. To circumvent the large sampling variability, the change in frequency is calculated utilizing the large datasets (shown in Fig. 5b) and is indicated in Fig. 6a (colored numbers). In the gure, the position of all CTs meta-classi ed with a mode (see Fig. 4) corresponds to their relative frequency (horizontal axis) and the change relative to this frequency due to strengthening of the mode the particular CT represents by 20% (e.g., the response in "Z" CTs is calculated by comparing Z60-M25-C15 to Z50-M30-C20). Of the 18 CTs associated with a mode, only one CT (#17) has a response outside the 90% interval of sampling variability. We also carried out an identical analysis, except for weakening of the modes (not shown). In this case, four CTs (#1, #8, #17, and #20) lied outside the interval. However, in nearly all cases, there is an agreement in the sign of the change (that is, all zonal CTs become less frequent if variance is subtracted from the zonal mode, and vice versa). This is what differentiates the response to changes in EV from sampling variability, for which the number of positive and negative changes in CTs associated with a mode tends to be approximately equal (not shown).
Last, we analyze the sensitivity of CT frequency to changes in the phase of modes. To this end, three sets of large datasets (10,000 elds; one set per mode) are generated in which the ratio of positive and negative scores of one mode is shifted from the original 50:50 distribution in favor of positive scores (the remaining two modes being unchanged). We focus in detail on two speci c examples (60:40, i.e., increase in frequency of positive scores by 20%, and 2:1); several other cases are also presented in Fig. 7a. Shifting a mode toward one phase seems to have a considerably more robust response in the frequency of CTs associated with the mode, compared to simply making the mode stronger: When changing the ratio to 60:40, the response in 72% of CTs can be distinguished from sampling variability, while there are no cases in which a CT would have a robust response to the same change in a mode it is not associated with. When the ratio is further increased to 2:1, all CTs have a robust response to the change in the mode they represent. However, in this case, ve (28%) CTs respond also to the shift in one of the modes with which they are not associated. For example, #2, associated with the zonal mode (pattern correlation of -0.84, that is, it represents the mode in its negative phase) becomes 26% less frequent if the scores of Z are shifted to 60:40, and even 32% more frequent if the scores of C are shifted instead, neither of the two values being outside the 90% interval of sampling variability of this CT. If the ratio of scores is further increased to 2:1, the CT has a robust response to changes in both modes (-41% for Z, + 55% for C). It is not clear why the response is stronger to changes in a mode with which the CT is less associated (r < 0.5), but it shows that even if one speci c kind of change in modes can be isolated, attributing changes in CTs to particular modes is far from straightforward.

Euro-Atlantic variability modes
In this section, experiments carried out on datasets generated from variability modes extracted from NCEP/NCAR reanalysis data are described and discussed. The goal of the study is to assess to what extent the ndings obtained for idealized modes of variability, shown above as well as in Part I of the study, are applicable to real atmospheric circulation variability. Therefore, to ensure comparability the methodologies of analyzing idealized and real-world modes are as similar as possible, including the choice of the grid (20×20 points) and SOM array (4×5 CTs). Compared to idealized data, the generated datasets are considerably more complex in this case. Four variability modes, together explaining approximately 77% of variability in winter daily 500 hPa GPH elds were retained and rotated. After rotation, the spatial patterns (Fig. 2) of the leading three modes resemble the regional structure of NAO, the East-Atlantic pattern, and the Eurasian pattern type 2, in turn (Barnston and Livezey 1987; Beranová nad Huth 2008). The weakest mode having one variability center in the Mediterranean region does not resemble any mode identi ed in other studies. We refer to the modes by their generic names (mode 1, mode 2, etc.) in the text, since-except for NAO-we believe that the lower-order variability modes are not generally known and/or are known under various names.
In Fig. 8a, the SOM of reanalysis data is shown. Fifteen (out of twenty) CTs are associated with modes, and each mode is associated with between 3-5 CTs. Each of these CTs is associated with one mode, except for #16 that is strongly correlated both with mode 1 and 4 (|r| ≈ 0.85); it is therefore metaclassi ed with two modes. The CT relative frequency in the reanalysis is indicated in Fig. 8b. The occurrences of most CTs in the reanalysis (blue hyphen) and in datasets generated from the real-world modes (red dots) are in agreement. The generated data markedly underestimate (overestimate) the occurrence of CTs #14 (#8), both of which are correlated with the modes only relatively weakly. Therefore, they are not associated with any of the modes, and as such they are not considered in the further analysis.
The sampling variability of CT frequencies in datasets with xed EV of modes and relaxed component scores and the sensitivity of CT frequencies to gradual changes in the strength of modes are indicated in Fig. 8b by red and black dots, respectively. The sampling variability averaged over the 15 meta-classi ed CTs is nearly identical to that of idealized modes, that is, 3.5 percentage points; therefore, a CT occurring 50 times in the datasets on average appears in individual datasets in approximately 33-66 instances. The sensitivity of CT frequencies to changes in EVs of real-world modes is also very similar to the idealized modes. Evidently, some CTs do not respond to changes in EV of associated modes at all (esp. #18, associated with mode 2) or respond only in a narrow band of the tested EV spectrum (all CTs associated with mode 4). The response in CTs associated with a mode is in most cases directly proportional to the change in EV of the mode; nevertheless, it seems to be relatively weak compared to sampling variability, unless unrealistically large changes in EVs are considered. In the reanalysis, the change in EV of individual modes between consecutive periods of 1,000 days rarely (never) exceeds 20% (25%), or four ( ve) percentage points. In other words, most of the range of EVs tested here and shown in Fig. 8b is theoretical and only cases close to the EVs observed in the reanalysis are of further practical interest.
In Fig. 6b, we show the change in relative frequency of CTs caused by strengthening of the mode (with which the particular CT is associated) by 20% and compare it to sampling variability. Identically to the idealized modes above, the variability is expressed as a relative change in frequency between multiple pairs of randomly selected datasets and shown by boxplots for several categories of CT frequencies. The results are very similar to those for the idealized modes; the position and the width of the 90% interval of sampling variability both depend on CT frequency in the reference dataset (DS1), the differences being rather large and overwhelmingly positive in small CTs and somewhat smaller and largely negative in large CTs. The variability in the CTs close to the median frequency (4-6%, in blue boxplots in Fig. 6b) is close to the ± 30% found for the idealized modes above. Of the 15 CTs, none has a response to strengthening (Fig. 6b, numbers) or weakening (not shown) of "its" mode by 20% that would lie outside of the 90% interval of its own sampling variability (Note that these intervals differ from those shown in the boxplots, which do not discriminate between CTs).
Except for the weakest mode, there is an agreement in the sign of changes of EV of modes and frequencies of associated CTs. Namely #16, associated with both mode 1 and 4, becomes markedly less frequent when mode 4 becomes stronger, due to its response to the simultaneous weakening of mode 1. This illustrates a drawback of interpreting changes in CT frequency in terms of changes in strength of teleconnections (i.e., EV of modes)-most of CTs have relatively high correlations with at least one additional mode, and they respond to changes in both modes. The other CTs with the weakest response to strengthening of "their" mode (#2, #3, #5, #18) all have relatively high pattern correlation with mode 1 (|r| > 0.5), and the positive trend due to "their" mode is mitigated, or even overturned due to change in mode 1. The response in CTs would likely further change if the variance added to one mode was subtracted from the remaining modes in a different way than how it was done here (not tested for realworld modes). Since the agreement in the sign of change of frequencies of associated CTs seems to be the only way to distinguish sampling variability from a response to change in EV, attributing changes in CT frequencies does not seem to be a reliable approach, at least for lower-order modes.
The sensitivity of CT frequency to changes in the phase of modes is shown in Fig. 7b. The response of CT frequencies to changes in the ratio of positive and negative scores is calculated independently for each mode (the ratio for the remaining three modes being xed at approximately 50:50) utilizing datasets of 10,000 elds to minimize the effects of sampling. The most extreme ratio of scores in the reanalysis data (observed over six consecutive periods of 1,000 elds) is approximately 58:42; the most extreme change in the percentage of scores of one sign between two consecutive periods is approximately ± 10 percentage points, or ± 20% (not shown). Various ratios are shown in Fig. 7b; nevertheless, the change from 50:50 to 55:45 (60:40) is discussed in detail, as it best represents the common (most extreme) cases observed in the reanalysis. The shift to 55:45 (modes in their strong positive phase) does not lead to a change in CT frequency in any of the associated CTs robust enough to be discernable from sampling variability. The shift to 60:40 (modes in their extreme positive phase) leads to a robust change in 50% of CTs, and with the exception of #2, CTs have a robust response only to their associated mode. In the theoretical ratio 2:1 (not observed in reanalysis at decadal scale), most CTs respond to at least two different modes. The results document that even though the sensitivity of CT frequency to shifts in the phase of modes is greatest for the associated mode, all CTs respond to more (in some cases even all four, for example #2) modes. This tendency was also apparent for idealized modes, but here it is more pronounced, likely because of much more equally distributed variance among the real-world modes and/or more modes. Consequently, there are fewer CTs that re ect a single mode because there are fewer cases in the data space in which all but one mode are close to their neutral phase.

Discussion And Conclusions
In Part II of our study, we focused on the temporal aspect of the link between modes of variability, or functions of spatial-temporal co-variability of atmospheric circulation elds, and CTs de ned by SOMs, complementing Part I that focused on the spatial aspect of the link. To assess the validity of results for real-world data, the study was expanded by including reanalysis data, namely the variability modes extracted from daily GPH elds over the Euro-Atlantic region. The results obtained for idealized and realworld variability modes were compared and the overall very good agreement suggests that methodological ndings obtained for idealized modes are valid for more complex atmospheric variability, in spite of notable differences in the number, complexity of spatial patterns and distribution of explained variances of the modes, and inclusion of noise in datasets based on real-world modes. Although little effect of adding noise to data on conclusions was expected due to similar ndings by Reusch et al. (2005) and Liu et al. (2006), its con rmation-together with little dependence of results on the kind of utilized modes-suggests that the idealized modes, which are easy to interpret and work with, could be used in the next step expanding the research to more complex studies including non-linear de nition and/or combination of modes.
The choices of how synthetic datasets were generated and which methods were used to analyze them were discussed in detail in Part I. The generation of datasets by linearly combining the underlying modes is very straightforward, as it reverses the way how modes are typically obtained (by PCA decomposition). It differs from previous studies that used synthetic data (Reusch et al. 2005;Liu et al. 2006), which in most cases generated elds directly from anomaly patterns, which resulted in a few well separated clusters. Such a de nition of data is very natural if one wants to assess the ability of a method to classify data. However, such data fail to mimic the continuous nature of atmospheric circulation elds. Given the fact that SOMs can be used both for classi cation and data exploratory projection, unlike SPCA, which does not provide a classi cation on its own, we are con dent that our data are more suitable for methodological studies as they make it possible that each method shows its strengths. The data also allow for different perspectives of teleconnections, that is, preferred quasi-stationary states (Risbey et al. 2015) versus leading components of co-variability, to be investigated in parallel. Regrettably, the different approach we generate data and the focus of previous studies on the spatial domain disables comparison of results.
The utilization of synthetic datasets enabled us to investigate how various changes in underlying modes affect the occurrence of CTs and, more importantly, to isolate the effects of these changes from one another. Furthermore, by generating multiple datasets that differ only in random generation of scores, or intensities, of modes, we quanti ed the sampling uncertainty typical for decadal-scale daily circulation elds. To our knowledge, modes of variability have not yet been utilized to this end, and we believe that this methodology could be extended in the future and used to evaluate the expectable variability in atmospheric circulation in general under the assumptions of stable versus changing variability modes.
Here, the sampling variability of CT frequencies obtained solely by the random generation of scores was remarkably high, and not only the range of obtained frequencies of CTs but also a narrower 90% interval in most cases exceeded the signal observed for changes in the strength (variance) or phase of modes of the rate observed in the reanalysis. Based on our results, we estimate that decadal-scale changes in CT frequency smaller than approximately ± 30% can occur without any changes in the strength and phase of any of the underlying variability modes, including their spatial structure. Consequently, future studies attempting to link changes in CTs to teleconnections should consider sampling uncertainty. Nevertheless, we stress that the exact value presented here depends on many factors, some of which were tested here (e.g., frequency of CTs) and some of which not (e.g., parameters of the classi cation, such as the number of CTs).
The signal of speci c alterations of an underlying mode consists in a concordant behaviour of most CTs associated with the mode. Here, the association was done by comparing the spatial similarity between the patterns of CTs and modes, following Yuan et al. (2015) and Rousi et al. (2020); we did not analyze the similarity in the temporal domain. Despite a variable rate of change, the sign of change in a set of CTs associated with a mode typically agrees with the sign of change in EV of the mode, and the two subsets resembling the positive and negative phases respond rather predictably to a shift of the mode toward one of its phases. However, a change in the frequency of a CT well outside of the sampling variability typically requires an unrealistically large change in the underlying mode.. This suggests that very strong changes in CT frequencies, documented for example in studies on Euro-Atlantic decadal-scale circulation variability (e.g., Johnson et al. 2008), are likely due to superimposed changes in underlying modes, that is, simultaneous strengthening and shift toward one phase of one mode and/or simultaneous changes in more than one mode. This is in agreement with the results of Jiang et al. (2013), who additionally point to the fact that the simultaneously acting factors can, on the other hand, counter-effect one another, making attribution problematic. Here, this behaviour was most apparent for the weakest mode, the contribution of which was in some cases overturned by changes in stronger modes.
Although we are aware that the conclusions relate much more closely to the concept of teleconnections as stationary linear oscillatory features, we believe that they should also be considered in studies that understand teleconnections in the continuum perspective of physical states, especially since PCA is used to delimit teleconnections within the SOM phase space.  Schematic description of the methodology using reanalysis data and Euro-Atlantic modes of variability (MOVs). An almost identical approach was used for idealized modes of variability, except that the spatial patterns and EVs of three modes were prede ned. Some additional datasets are not indicated in the scheme Figure 4 The 4×5 SOM trained using data generated from idealized modes of variability. The CTs resembling the spatial patterns of modes (|r| > 0.8 for Z and M; |r| > 0.75 for C) are framed in color: Z (red; 1, 2, 6, 7, 14, 15, 19, and 20), M (green; 3, 5, 9, 12, 17, and 18), and C (blue; 8, 10, 11, and 13) Figure 5 The sensitivity of CT frequencies of the SOM in Fig. 4 to changes in idealized modes of variability. a CT frequencies for different combinations of EVs (black dots) and for 100 versions of datasets based on identical combination of EVs (red dots). b Same as (a) but for datasets comprising 10,000 elds instead of 1,000 (only the rst row of CTs is shown). c CT frequencies for different combinations of EVs of M and C, with EV(Z) xed to 50%; datasets of 1,000 (10,000) patterns are shown by black dots (red circles) Figure 6 Differences in CT frequency occurring as a result of sampling variability (boxplots) and strengthening of modes by 20% relative to the original Z50-M30-C20 distribution of EVs (numbers) in datasets based on a idealized modes and b Euro-Atlantic modes. The boxplots show the distribution of randomly occurring differences in the frequency of CTs for 100 pairs of datasets (DS1 and DS2) randomly selected from the 100 datasets generated for the Z50-M30-C20 ratio of EVs [the differences is expressed as (DS2-DS1)/DS1×100%], separately for categories of relative frequency of CTs in DS1; the 5th and 95th percentiles are indicated by staples, the boxes are bounded by the rst and third quartiles, and the thick line is the median. The color of boxplots is explained in the text. In a, the colored numbers identify CTs as shown in Fig. 4, only CTs resembling the patterns of zonal (red), meridional (green), and circular (blue) modes are shown. In b, the colored numbers identify CTs as shown in Fig. 8a, only CTs resembling the patterns of mode 1 (red), mode 2 (green), mode 3 (blue), and mode 4 (yellow) are shown. The position of each CT is given by its average frequency across all datasets generated for the Z50-M30-C20 ratio of EVs (x-axis) and its relative change (y-axis) in the dataset in which the mode that is represented by the particular CT is 20% stronger. Smaller datasets (1,000 elds) are used to calculate the sampling variability; larger datasets (10,000 elds) are used to calculate the sensitivity to changes in EVs Response of CT frequency due to changes in the ratio of positive and negative scores of individual modes relative to the original 50:50 ratio (i.e., a shift toward positive phase of mode). a Idealized modes: CTs are organized and highlighted by boxes as in Fig. 4; the response in frequency is calculated and shown (by connected dots) separately for changes in each mode: Z in red, M in green, C in blue. b Euro-Atlantic modes: CTs are organized and highlighted by boxes as in Fig. 8a; the response in frequency is shown for mode 1 in red, mode 2 in green, mode 3 in blue, and mode 4 in yellow Thick horizontal lines delimit the 90% interval of differences due to sampling variability in datasets of 1,000 elds Figure 8 a Same as Fig. 4 except for Euro-Atlantic modes of variability; the interval of isolines is 30 gpm. The CTs resembling the spatial patterns of modes (|r| > 0.7) are framed in color: mode 1 (4, 9, 11, 16, and 19), mode 2 (1, 7, 13, and 18), mode 3 (2, 3, 15, and 20), and mode 4 (5, 12, and 16); CT #16 is classi ed with