On the Use of Self-Organizing Maps in Detecting Teleconnections. Part I: Spatial Aspects

Self-organizing maps (SOMs) represent a popular classication tool, output of which has been used to link typical synoptic-scale circulation patterns to large-scale teleconnections, or modes of low-frequency circulation variability. However, recently there have been attempts to interpret SOM output directly as a continuum of teleconnections. In the present paper, we provide a theoretical study dealing with how predened idealized modes of variability project onto SOM arrays. Three orthogonal modes are dened and their various linear combinations used to generate datasets of known structure. Multiple variants of SOMs that differ in their initialization are generated for SOMs of various shapes and sizes. The results show that how a mode projects on a SOM array is sensitive not only to data structure, but also to various SOM parameters. The leading mode of variability projects rather strongly on SOMs if its explained variance is markedly higher than that of the second-order mode; the remaining modes project considerably more weakly, and all modes tend to blend when their explained variance is close to each other, which leads to underrepresentation of some phases of modes and/or combinations of modes among the SOM patterns. Furthermore, non-linear features appear in SOM arrays even if the underlying modes of variability are strictly linear. The ndings point to limitations in the applicability of SOMs in the research of teleconnections.

and Philipp et al. (2016) for detailed reviews of the topic of circulation typing-SOMs enable approximating the broad variability in circulation elds by only a small number of representative nodes, or circulation types. Additionally, SOMs offer the possibility to organize these types into an array in which the distance between individual types approximates the (typically) Euclidean distance between them in the original multidimensional space; that is, SOMs are not only able to classify data but also represent their topological structure. This latter characteristic of SOMs is arguably the main reason of their popularity, since it eases visualization and, consequently, also interpretation of results. Comas-Bru and Hernández 2018), the latter method having arguably become the most popular one.
The SPCA decomposes data transformed into a similarity (typically correlation or covariance) matrix whose values represent relations between time series of a typically gridded input meteorological variable.
(Note that TPCA, mentioned earlier among classi cation methods, also decomposes a similarity matrix; however, this matrix represents similarity between pairs of geophysical elds, such as pattern correlation, instead of relations between pairs of variables, which yields qualitatively different results. A reader is referred to Preisendorfer (1988) for detailed information on various modes of decomposition and Compagnucci and Richman (2008) for a comparison of SPCA and TPCA in climatological context). The SPCA results in a set of new variables produced as linear combinations of input variables, which are mutually orthogonal, maximize the explained variance, and have traditionally been referred to as modes of variability (Barnston and Livezey 1987). Since it was shown by Horel (1981) that the leading modes of variability obtained by rotated SPCA well reproduce the teleconnection patterns found earlier by Wallace and Gutzler (1981), the terms teleconnections and modes of variability have been used more or less interchangeably. Although some instances can be found in literature in which the term mode of variability refers to variability patterns in general, including circulation types and/or transitions between them (e.g., Reusch et al. 2005; Lennard  On the other hand, the term teleconnection is here understood more loosely such that it accommodates other de nitions than that of a spatial function. One particular motivation behind the application of SOMs (and less often also cluster analysis) to analyses of teleconnections instead of SPCA and other linear methods is to study their non-linear features, such as differences in the spatial structure teleconnections may have in their opposite phases, which cannot be captured by the abovementioned traditional methods due to their imposed linearity and/or orthogonality (Risbey et al. 2015). A typical application of SOMs is to study the extent to which occurrences of typical atmospheric patterns are conditioned by intensity and phase of leading modes of variability (e.g., Reusch Rousi et al. 2020), which are in turn applied to describe the mode's temporal changes (in phase and intensity) and/or changes in its spatial patterns by means of analyzing shifts in the frequency of occurrence among the pertinent SOM types. Note that compared to the traditional "discrete modal perspective" (Chang and Johnson 2015), in the continuum perspective a teleconnection does not have one xed spatial pattern, although its generalized positive and negative patterns can be retrieved by weighted averaging of pertinent SOM types (e.g., Chang and Johnson 2015). The de nition of a teleconnection is in the continuum perspective shifted from a statistical link expressing co-variability among distant locations to a less constrained "low-frequency anomaly pattern" (Yuan et al. 2015), which shows a particular temporal realization of such a statistical link rather than the link itself, although in reality typical daily anomalies have been interpreted as teleconnections in this context more often than low-pass ltered data (e.g., Johnson et al. 2008;Chang and Johnson 2015;Rousi et al. 2015;Yuan et al. 2015).
A typical way to delimit individual "canonical" teleconnections (leading modes of variability de ned by linear methods) within a SOM teleconnection continuum consists in projecting modes of variability (principal components, PCs) on SOM arrays and comparing similarities between the two representations of teleconnections in the spatial (e.g., Yuan et al. 2015, Rousi et al. 2020 or temporal (e.g., Johnson et al. 2008) domain, which links the two perspectives of teleconnections together. Theoretical studies on how modes of variability project on SOM arrays seem, therefore, particularly relevant to the continuum perspective. It is, however, an area of research that has so far received only very little attention. Jiang et al. (2013) have shown that in uences of various teleconnection indices on the frequency, intensity and airmass characteristics of typical geopotential height patterns are complex and can counteract one another; consequently, attributing changes in these patterns to one particular teleconnection seems far from trivial. Most of previous studies indicate that the leading variability mode tends to project strongly on types around two opposite corners of a SOM array, and the very corners are often representing positive and negative polarities of the mode. According to Lennard and Hegerl (2015), the two diagonals of a SOM array are analogous to rst two principal components and thus to the leading two teleconnections. This somewhat contradicts the suggestion of Reusch (2010) that if the leading mode of variability projects on one diagonal of a SOM array, the second-order mode may or may not project on the other diagonal. Furthermore, Reusch (2010) suggests that differences in the length of two diagonals on which the leading modes of variability do project-measured in the phase space that approximates Euclidean distances among data (refer to Sect. 2, Sammon mapping for more detail)-stem from data nonlinearities. Only very limited attention has been given to the third-order mode, which is likely due to its very limited and scattered signal in two-dimensional SOMs. This means that all lower-order modes of variability still project on SOM arrays and in uence any results, but are ignored in the interpretation.
In order to contribute to the understanding of the continuum perspective of teleconnections as well as the links and differences between different perspectives of teleconnections, we present a two-part study that deals with spatial (Part I) and temporal (Part II) aspects of using SOMs as a tool for studying teleconnections. In the present Part I of the study, we analyze how three prede ned modes of variability used to construct various synthetic datasets project on SOMs trained with a wide range of parameters.
The paper is organized as follows: in Sect. 2, we describe the datasets and methodologies of SOMs and Sammon mapping (Sammon 1969), which is used to visualize the results; in Sect. 3, we analyze the links of SOM arrays to modes of variability and to data structure, and in Sect. 4 we discuss the results and summarize main implications for the continuum perspective and nonlinearity of teleconnections.

Synthetic data
In the study, multiple synthetic datasets with prede ned-and thus known-structure are used to train SOMs. The datasets are built using three modes of circulation variability that describe pattern-to-pattern changes in zonal (Z), meridional (M), and circular (C) ow. The three modes are de ned as mutually orthogonal and of unit length, in order that contribution of individual modes to the total variability can be fully controlled by generated indices (amplitudes). In order to avoid calculation with very small numbers and to make the resulting generated elds easier to read, all three patterns are scaled up by the same factor, the size of which is not relevant, but explains the larger isoline intervals in the spatial patterns of modes (Fig. 1). The patterns can be interpreted as anomalies of an unspeci ed circulation variable that represent the modes in their positive phase and intensity of + 1.0.
Each dataset comprises 1,000 circulation elds; each eld consisting of 20×20 grid points is a certain linear combination of the three patterns, whereby each mode is multiplied by a scalar index. The index therefore represents the intensity of a particular mode at a particular time instant. For each mode, the indices are drawn from normal distributions, the mean of which is zero and variance is a certain fraction (explained variance hereafter) of the total variance σ 2 = 1. A total of twelve combinations of explained variances (EV), and thus datasets, are prede ned (Table 1) such that σ 2 Z ≥ σ 2 M ≥ σ 2 C . For example, in the dataset "Z70-M20-C10", 70% of the total variability is in the zonal ow, while the contribution of the meridional and circular ow is 20% and 10%, respectively. We also de ne several simple "bivariate" datasets in which the third mode is "turned off" (e.g., Z70-M30). Note that datasets in which another mode than the zonal mode was the strongest (e.g., Z30-M70) were not included in the study as they did not lead to any new ndings.

Training of self-organizing maps and quality metrics
Here, we describe the methodology of training, evaluating and selecting SOMs. The methodology is indicated in Fig. 2; choices of parameters along with their effects of results are further discussed in Sect. 4.2. A reader is referred to Kohonen (2001) and Sheridan and Lee (2011) for a more detailed description of SOMs. Note that in the paper we use the term type instead of node, in order to prevent confusion between the terms node and mode.
The training of a SOM constitutes an iterative and unsupervised process. At the start, n starting types, or vectors (in our case, generated circulation elds) are randomly selected from a dataset to initialize the SOM array; n also equals the resultant number of circulation types. Subsequently, each of the daily patterns is step by step and in many iterations projected on the array and the position of types that are close (in terms of Euclidean distance) to the pattern is adjusted. The adjustment depends on three parameters that have to be speci ed in advance. First, the neighborhood radius (further referred to as radius for short) de nes the size of the affected area around the projected data point that is adjusted. Second, the neighborhood function de nes the mode of the adjustment, and third, the learning rate speci es how much the types within the affected area are attracted to the projected data point. The Gaussian neighborhood function was chosen here as it led to SOMs with somewhat better topology than the commonly used bubble function. The value of the remaining two parameters is typically chosen to decrease in each iteration step, which allows for capturing the overall topological structure of the dataset rst and then ne tuning the resulting classi cation during the later iteration steps. All choices of parameters are discussed in Sect. 4.2.
The "Kohonen" R package (Wehrens and Buydens 2007; Wehrens and Kruisselbrink 2018) was used to train SOMs. For each dataset, SOMs of 13 different shapes (number of rows × number of columns) of arrays were calculated: 2×3, 2×4, 3×3, 3×4, 3×5, 4×4, 4×5, 4×6, 5×5, 5×6, 5×7, 6×6, and 6×7, resulting in classi cations into between six (2×3 SOMs) and forty-two (6×7) types. The number of iterations was set to 1,000; a higher value did not lead to higher quality SOMs, likely due to the relatively simple structure of data. The initial value of the learning rate was set to between 0.04 and 0.12 for smallest and largest SOMs, respectively, and to linearly decrease to 0.01 in all cases. The radius parameter was initialized as the 66th percentile of type-to-type Euclidean distances, calculated from coordinates of types within the SOM array (i.e., row and column), and was set to linearly decrease over the 1,000 iterations to a speci c terminal radius value. Since Jiang et al. (2015) and Gibson et al. (2017) pointed to the sensitivity of SOMs to this particular parameter, and this sensitivity was also very apparent in our preliminary analyses, multiple values of the terminal radius were tested for each SOM shape -dataset combination. This complex issue will be described in detail in Sect. 3.1.
Two kinds of quality metrics are typically used to assess SOMs-quantization error and topological error (Kohonen 2001). Although their exact speci cation varies among studies, their purpose does not: While the quantization error shows the degree of data separation into clusters-and can be therefore understood as a measure of quality of a classi cation, the topological error shows the degree to which the organization of types within a SOM array captures the structure of classi ed data. Since both metrics depend on prede ned parameters, they are typically used to select a single optimum SOM of multiple ones trained. Here, quantization error of a SOM is calculated as the Euclidean distance between each circulation eld and its closest type, averaged over all 1,000 classi ed elds. The topological error represents the percentage of circulation elds whose two closest types do not lie next to each other in the SOM array.

Sammon mapping
Sammon mapping is a non-linear dimension reduction method that maps multivariate data points (here, 400-dimensional circulation elds) to lower-(here, two-) dimensional space such that the data structure is approximately preserved; that is, inter-point distances (typically, Euclidean) in the original and the projection Sammon space correspond (Sammon, 1969). This is achieved by iteratively looking for a solution that minimizes the error function where denotes the Euclidean distance between ith and jth circulation elds in the original space and denotes their Euclidean distance in the projection (Sammon) space.
The purpose of Sammon mapping in the study is twofold. First, it is used as an additional quality measure to assess the topological structure of trained SOMs. Second, it is utilized as a means of analyzing the links between the generated datasets, the position of SOM types within the data, and the underlying modes of variability (structure of data). The following approach is used: First, a Sammon map is calculated for each dataset; the function "sammon" of the R package MASS (Venables and Ripley 2002) is used to this end. To plot a SOM onto a Sammon map, each type is projected onto the map by nding the point in the Sammon phase space (at a step of 1×1 unit) for which the projection error F (Eq. 2) is the smallest: where denotes the Euclidean distance between the ith circulation eld and type t in the original (projection) space. The same method can be used to project any circulation eld, which we utilized for projection of multiples of patterns of variability modes to analyze how the modes are oriented within the maps.
In Fig. 3, Sammon map of one dataset is shown together with one SOM and its projection onto the Sammon map. Note that in the Sammon map in Fig. 3a, relative to the orthogonal array of types in Fig.  3b, the array is slightly deformed such that the distance between individual SOM types approximates the Euclidean distance between the spatial patterns of the types in the original space.

Training and selection of SOMs
Although calculation of a SOM constitutes an objective process that does not require any interventions on the part of the researcher, multiple parameters affect the eventual properties of a SOM, many of which require subjective decisions to be made a priori. Above all, the number of types (size of the SOM array) determines the degree of generalization, and the asymmetry of the SOM array (difference between the numbers of columns and rows) potentially affects the ability of a SOM to represent datasets with various structures (differences in the number and explained variance of individual modes of variability). Furthermore, previous research has shown (see, e.g., Rousi et al. 2015) that when selecting a SOM, one typically has to balance between the quality of classi cation (as expressed, for example, by the quantization error) and the quality of SOM topology-the goal of the study being the decisive factor. The aim of the present study required selecting SOMs with good topological structure, however, without compromising quality of classi cations.
First, for each of the twelve datasets and thirteen SOM shapes, a total of 320 (200) variants of SOMs were trained for SOMs up to (larger than) 4×4; the difference stems from markedly longer computation time of larger SOMs. The variants differ in their initialization (20 variants were calculated for each dataset -SOM shape combination) and also in the terminal radius: For smaller (larger) SOMs, values between 0.0 and 1.5 (0.9 and 1.8) were tested, in both cases at a step of 0.1. For small SOMs (especially if combined with datasets with simple and clear structure, that is, two modes of variability of dissimilar strength), best results (in terms of both quantization and topology) are typically achieved for low terminal radius. For larger SOMs, one has to choose larger values of the terminal radius to avoid deterioration of the topological structure (except for the simplest data).
Second, one SOM was eventually chosen for each dataset and SOM shape (therefore, a total of 156 SOMs were analyzed), by selecting the SOM with the smallest quantization error among the SOMs in whose projections onto the Sammon space folding of arrays (crossing of types-to-type links) did not occur. In case multiple SOMs met these conditions, the one with the smallest topological error was selected. In Fig. 4, quantization and topological errors and their dependence on terminal radius, SOM shape and dataset structure are shown for selected cases.
Not surprisingly, SOMs with lower quantization error are those with larger arrays, which better explain data variations. Furthermore, the overall larger error in the case of datasets with variance spread more evenly among the modes is due to overall larger inter-pattern Euclidean distance in these datasets (not shown). Of particular note is a marked drop in the error when the terminal radius drops below 1.0-but no further decrease is observed beyond. For future studies, experimenting with the terminal radius close to the threshold of 1.0 may, therefore, su ce. However, this may not be the case if large SOMs were combined with data of little structure (similar variance of modes). Here, for example, terminal radius of 1.4-1.6 was the minimum that led to SOMs with reasonable topological structure for arrays of [30][31][32][33][34][35][36][37][38][39][40][41][42] types combined with datasets of little structure (not shown here).
The link between the topological error and SOM parameters is somewhat less clear, especially for terminal radius above 1.0: the link can be indifferent, directly as well as indirectly proportional, depending on the particular dataset -SOM shape combination. A sharp increase of the error can be, nevertheless, seen in nearly all of the 156 combinations tested when the terminal radius drops below 1.0, except for a few cases concerning the simplest datasets. The relatively high spread of the error, especially when terminal radius is set under 1.0, highlights the sensitivity of SOMs to its initialization and, thus, the importance of calculating multiple variants of SOMs to avoid the risk of achieving only suboptimal local optima of classi cation. Despite comparable quantization error for all variants differing only in their initialization, classi cations with identical parameters can lead to two or even more local optima of the topological structure-note for example the three local optima for Z50-M40-C10 dataset, 3×4 array, and radius of 0.7 in Fig. 3b. The Sammon maps of three SOMs that represent the three optima are shown in Fig. 5 (notice the SOM in Fig. 5a has markedly better topology compared to the other SOMs). Multiple local optima in SOM topology are often found for oblong SOMs of data that have equal or close to equal strength of two modes (zonal and meridional, or meridional and circular). Therefore, an assessment of data structure by PCA may represent a useful tool prior to SOM classi cation in terms of selection of some SOM parameters.

SOM topology and the modes of variability
Here, we present how SOM types are organized within an array according to their spatial similarity to modes of variability. Sammon mapping is utilized to visualize the topological structure of both data and SOM arrays, and Pearson's pattern correlation is used to measure the spatial similarity between patterns of types and modes. Additionally, several spatial patterns representative of various intensities and both phases of the modes are projected onto Sammon maps to obtain a generalized non-linear projection of the underlying linear modes.
First, the simplest cases are presented, in which data variations are determined only by two modes of variability (zonal and meridional). Second, SOMs of datasets generated from all three modes of dissimilar strength are presented. Last, we focus on the speci c behavior of SOMs that occurs if all three modes have nonzero variance and equal strength of two or all three modes. In each case, we rst focus on how the modes project on the Sammon maps of data, and then analyze the relation between the projections of modes and SOMs.

Two modes of variability
In the three simplest datasets (Z90-M10, Z70-M30, Z50-M50) we test the organization of SOM types relative to the two underlying variability modes, and mainly whether projections of the two modes are orthogonal to each other in the nonlinear Sammon projection, and whether the modes can be linked to the diagonals of SOM arrays, as suggested before by Lennard and Hegerl (2015).
The generated circulation elds are clearly organized in the Sammon map according to their spatial similarity to patterns of variability modes. In all three datasets, projections of the two modes are orthogonal to each other. In Fig. 6, this is apparent both indirectly from correlations between data and modes and directly from projections of idealized patterns representing the modes onto the Sammon maps. Furthermore, projections of these idealized patterns are linear. In the two datasets of unequal strength of modes, the modes project along the two axes of the Sammon space (Fig. 6a,b). In the special case of their equal strength (Fig. 6c), the modes remain orthogonal after projection, but do not project on the horizontal and vertical axes of the Sammon map. This is likely due to the fact that the horizontal axis is by default oriented in the direction of maximum explained variance (in our case, this is true for all datasets with one leading variability mode), which is unstable for data of spherical (rather than elliptic) shape. However, this does not affect our analysis, as the relative links between the projected objects do not change if the Sammon map is linearly transformed (e.g., rotated).
In Fig. 6, three SOMs (4×4, 4×5, and 4×6) are projected onto each of the three Sammon maps (data sets) to illustrate the link between their topology and projection of modes. The diagonals of the square 4×4 SOMs project on the axes of the Sammon space in the case of Z90-M10 and Z70-M30 datasets, that is, opposite corners of SOM arrays depict opposite phases of modes in their rather extreme state. In the case of Z50-M50 (Fig. 6c), however, the modes strongly project on the (theoretical) center row and column of the array, with the corner types representing elds with about equal contribution of both modes.
In general, with an increasing elongation of SOM arrays, its center row tends to be aligned with the leading mode rather than a diagonal, but a wide range of results was obtained in different datasets.
Consequently, the leading mode may strongly project on two opposite corners (as in 4×5 and 4×6 SOMs of Z90-M10 data), on only one corner (4×5 SOM of Z70-M30 data) or none at all (4×6 SOM of Z70-M30 data). The weaker mode does not project most strongly on the remaining two corners but rather on the types next to and second next to the corners in 4×5 and 4×6 SOMs, respectively. The corner types themselves represent more complex elds with non-negligible contribution of both modes. The results obtained for SOMs of other shapes are similar to those presented above: Modes project well on the diagonals (center row and column in the case of Z50-M50) of square SOM arrays regardless of their size, while only the leading mode occasionally projects on a diagonal in oblong SOMs.

Three modes of variability of different strength
Four datasets composed of three modes of variability of different strength were analyzed in the study (Z70-M20-C10, Z60-M30-C10, Z50-M40-C10, and Z50-M30-C20). Relative to the bivariate datasets, shifting 10-20% of total variability to the third-order circular mode does not markedly affect the projection of the rst two modes onto the Sammon map: the zonal (meridional) modes project on the horizontal (vertical) axes of the projection plane (Fig. 7). When projecting the circular mode on the Sammon map, its stronger realizations (i.e., ± 1×C and beyond) appear along the vertical axis (i.e., orthogonally to the leading mode, that is, more or less parallel to the second-order mode). However, elds with such strong contribution of circular ow are generated only very rarely in these four datasets, the maximum generated indices of the mode being-in absolute values-1.23 (1.49) in C10 (C20) datasets, and the 95th percentile being only about 0.6. On the contrary, the ± 0.5×C patterns project in all four cases very close to the center of the Sammon map. In general, elds highly correlated with the circular mode are scattered across the whole Sammon map, but their highest density is close to the center. In the center, elds with only weak circulation ("turned-off" zonal and meridional modes) occur, and therefore, only a small contribution of the circular mode leads to relatively high pattern correlation between its pattern and data elds.
The orientation of SOM arrays relative to modes of variability and the shape of the arrays also resemble those obtained for bivariate datasets: the two leading modes project onto diagonals in all square SOMs except those of the Z50-M40-C10 dataset. On the other hand, the two leading modes project along the central row and column in most SOMs that are strongly elongated (i.e., those with two more columns than rows), the corner types then comprising linear combinations of the two leading modes. The orientation of SOMs with one more column than rows varies between the typical orientation of square and strongly elongated SOMs, depending on dataset and SOM parameters (number of types, terminal radius). The corner types of these slightly elongated SOMs can be either linear combinations of two or three modes or strong projections of one particular mode. In the latter case, patterns in opposite corners typically are not mirror images of each other (see Fig. 8a,b); such a situation is further referred to as "nonmirrored patterns".
Consistent with the projection of the ± 0.5×C patterns close to the center of Sammon maps, also types resembling the pattern of this mode occur close to the center of SOM arrays. Note, however, that the proximity to the center is in terms of the Sammon map rather than SOM array coordinates. This is apparent in types [1,3] in Fig. 8b, and [4,3] and [4,4] in Fig. 8c, which-despite their location at the edge of SOM arrays-lie relatively close to the center of their respective Sammon maps, due to a rather deformed shape of the SOM array in the former case (Fig. 7b) and rather small array (consequence of a higher terminal radius necessary to produce a SOM with good topological structure) in the latter (Fig. 7c). However, the occurrence of types resembling the circular mode in the center does not mean that other types do not show signs of circular ow; rather the opposite holds, which is apparent from the spatial patterns of types (Fig. 8).
Both the Sammon map and projection of modes in the case of Z40-M40-C20 data (not shown) resemble those of Z50-M40-C10 (Fig. 7b), except the leading two modes are not aligned with the Sammon map axes, similarly to Z50-M50 (Fig. 6c).
The three cases with equal strength of the meridional and circular modes are presented in Fig. 9, including Sammon maps of 4×4, 4×5 and 4×6 SOMs-the spatial patterns of the latter are shown in Fig. 8d-f. In all three cases, the leading mode is approximately aligned with the horizontal axis of the Sammon map. However, despite the equal strength of the two remaining modes, the three cases differ in how these two modes project onto the Sammon map of data as well as onto the SOMs.
In the case of Z60-M20-C20 (Fig. 9a), both modes project onto the vertical axis of the Sammon map, such that their positive (negative) values occur mostly above (below) the horizontal axis. A similar distribution is also apparent in patterns of SOM types (Fig. 8d) In the case of the Z50-M25-C25 dataset (Fig. 9b), the three modes project onto the Sammon map similarly to Z50-M30-C20 (Fig. 7c), with the zonal and meridional modes aligned with the two axes and the circular mode scattered across the map with increased density of high correlations in the center of the map. The SOM is in this case deformed such that its upper left corner (see type [1,1] in Fig. 8e) and the center of the bottom row (type [4,4]) lie close to this area of increased data density and represent both phases of the circular mode. On the other hand, strong projections of the meridional mode occur in types that lie farthest from the horizontal axis in the Sammon map (types [1,3] and [4,6]). Thus, while two corners of the SOM that lie close to the horizontal axis show opposite phases of the leading mode, the remaining two corners show one phase of one of the remaining two modes each.
The Sammon map of the Z40-M30-C30 dataset (Fig. 9c) reveals that in this case it is the circular mode that projects orthogonally to the leading zonal mode. The projection of the meridional mode on the vertical axis is somewhat less clear than in Z60-M20-C20, but still apparent; note, however, that it does not have a clear representation among the types (Fig. 8f). Nonetheless, the northerly (southerly)

On data
In research on methodology to study spatial-temporal variability of atmospheric circulation, synthetic datasets play an essential role. The structure of such datasets is a very important factor since it constrains the results to a large extent. This seems to be of particular importance if one compares the two alternative perspectives of teleconnections, that is, the spatial patterns of temporal co-variability on one hand and the regime-based approach that sees teleconnections as typical anomaly patterns on the other.
In the two previous studies that directly compared the skill of SPCA and SOMs in extracting known patterns (modes) from synthetic datasets (Reusch et al. 2005;Liu et al. 2006), most of datasets were generated by repeating prede ned patterns, which led to datasets consisting of a few distinct clusters. These clusters were represented well by SOMs in the latter study if there were at least as many SOM types as clusters; in the former study, considerably more types were necessary to avoid blending of clusters. SPCA, on the other hand, was in most cases unable to capture the spatial structure of the prede ned modes, even though the data were strictly linear. Datasets generated in such a way seem bene cial to study methods and their ability to recognize such clusters; however, the application of SPCA on such datasets is inherently meaningless, since it is not designed to recognize clusters in realizations, Consequently, to obtain data that could be used to compare different perspectives on teleconnections and avoid biased results, we construct synthetic datasets in a way that can be understood as reversed SPCA-that is, constructing data from prede ned orthonormal vectors (loadings) that carry the information on how grid points co-vary and times series of amplitudes randomly drawn from normal distributions that carry the variability itself. A resulting dataset forms a continuum of circulation elds rather than a few distinct clusters, which better approximates the structure of real circulation-see, e.g., Fig. 7 in Philipp et al. (2016). More importantly, datasets de ned in such a way can still be successfully classi ed and do not limit the applicability of SOMs in any way, but they can also be linked to underlying modes of variability. Therefore, they make it possible to study both perspectives of teleconnections in parallel, in our case by means of visualizing both SOMs and the underlying modes of variability in one projection plane, for which the nonlinear Sammon mapping is utilized.
We did not add noise to the generated datasets as according to the results of Reusch Fig. 9 there) clearly show that linear methods blend those modes whose spatial patterns are correlated. Therefore, the three modes were de ned such that they are mutually uncorrelated. We also de ned the modes to be mutually equidistant, since one can hypothesize that SOMs may also tend to blend modes that are relatively close to one another, in terms of the distance metric.
All in all, even though our data are more complex than data used in previous similar studies, they are still relatively simplistic compared to real-world data. In Part II, additional datasets will be generated from modes extracted from reanalysis data to assess whether ndings based on simplistic datasets hold in the real world.

On methodology
Beside the datasets, two other speci c issues seem particularly important for our study due to its aim at methodological aspects. First, any results are constrained by the quality of SOMs, while the assessment of quality is to a large extent goal-speci c and a result of choice of multiple parameters (e.g., Gibson et al. 2017). Previous studies provide only limited guidelines regarding these choices, and in many cases the parameters were selected randomly or not mentioned at all (Sheridan and Lee 2011). The second issue regards the application of Sammon mapping beyond a mere tool for validation, toward providing a visual link between data, modes and SOMs.
The choice of parameters was indicated in Fig. 2 and Sect. 2.2. The parameters were chosen experimentally based on an initial sensitivity study, with the exception of the algorithm type and grid topology, which were set to sequential and rectangular, respectively. Kohonen (2013) suggests that the batch algorithm be used instead of the sequential one, as it is faster and safer. However, it is not clear to what extent this recommendation relates to the typical utilization of SOMs in synoptic climatology consisting in de ning relatively very small SOMs used for classi cation rather than for exploratory data projection. According to Jiang et al. (2015), the sequential algorithm was used by the majority of research in synoptic climatology.
The training was initialized using randomly selected data points. We did not choose the alternative initialization from leading two principal components to assure that the relative orientation of SOMs and prede ned modes in the Sammon space was not biased by this step. Since random initialization was quite sensitive to initial conditions, 20 variants of SOM were calculated for each combination of the remaining parameters. Furthermore, the commonly used Euclidean distance was chosen as the metric of similarity, as other metrics provided by the package (e.g., Manhattan, sum-of-squares) did not lead to higher quality SOMs. Last, we kept the number of training iterations fairly low (1,000), and the training was done in a single phase. Increasing the number of iterations and/or splitting the training into two phases of "rough" and " ne" ordering, as suggested by Kohonen (2013), did not lead to qualitatively different SOMs, probably due to the combination of small arrays and small datasets.
The package provides two neighborhood functions: bubble and Gaussian. Given the limited options provided by the algorithm, we did not test other functions such as the Epanechikov function that was found superior by Liu et al. (2006), although Jiang et al. (2015) found no notable difference between Epanechikov and Gaussian functions for SOMs optimized for classi cation. Both available functions led to SOMs with good topology; however, each of them required a speci c range of neighborhood radii. In general, the Gaussian function led to SOMs less sensitive to the radius, notably to its lower terminal value. According to Wehrens and Kruisselbrink (2018) the Gaussian neighborhood always adjusts all types to some extent, even if the terminal radius is close or equal to the threshold of 1.0. For the bubble neighborhood, iterations calculated at the threshold account only for the winning type, which turns the process into k-means (Kohonen 2001). Regardless the function, in our case such iterations typically led to a marked change in quantization (decrease) and topological (decrease) errors and to dissipation of SOM topology. Gibson et al (2017) suggest training two sets of SOMs that differ in the inclusion of "k-means" iterations, arguing that the two sets may complement each other, one providing good topology, the other good clustering. Since SOM topology itself was the subject of our study, SOMs were only allowed to include a "k-means" phase if no folding of SOMs was detected in the Sammon projection.
Note that radius less than 1.0 mentioned in the study has no physical meaning and is speci c to the "Kohonen" R package; there may be differences in other available software (Gibson et al. 2017). It allows for specifying the fraction of iterations with no neighbor modi cation: for example, radius set to linearly decrease from 3 to 0 means that the last one third of training does not modify neighbors.
We experimentally found that the effect of the learning rate and the initial radius was relatively small, once obvious errors were removed. In a few cases, which were discarded, an insu cient initial radius led The last point regards the use of Sammon mapping. Strictly speaking, Sammon mapping adds a step into the analysis that could be considered redundant or even introducing new uncertainty due to the black-box-like character of the projection that utilizes the nonlinear relations among all data points to de ne the projection plane. However, the idealized rectangular array of a SOM limits the study of its topological structure to such an extent that the bene ts of Sammon mapping outweigh the negatives. Furthermore, utilization of a data projection technique allows one to see individual data points instead of SOM type frequencies only, which makes it possible to show how the linear modes of variability project on data in a nonlinear continuum framework.

On using SOMs to study teleconnections
Given the high number of combinations retained for the analysis (13 SOMs for each of the 12 datasets), only a relatively small subset of results could be chosen for presentation that, nevertheless, well represents the range of results. While the number of SOM types (size of array) does not seem to play that important a role in SOM topology (contrary to its strong link to explained variation), the shape of the array (ratio of the number of columns and rows) as well as the structure of data are important factors.
The projection of two leading modes of variability on SOM array diagonals, a topology pattern suggested in earlier studies (e.g., Reusch et al. 2007), seems quite rare. An alignment of SOM diagonals with two leading modes-which results in mirrored patterns of positive and negative phases of modes in opposite corners-was detected only in square SOMs (in about 50% of cases). On the other hand, in about 60% of strongly elongated SOMs (2×4, 3×5, 4×6, and 5×7) another topology pattern was observed, in which the two leading modes are approximately aligned with the center row and column of the array, the diagonals then representing their combinations. Consequently, SOMs with an odd number of rows/columns show the spatial structure of modes somewhat better, especially those with small arrays. However, most of the remaining cases, and also most large SOMs and SOMs of datasets in which two (or all three) modes have approximately the same strength, cannot be described with either of the two idealized topology patterns, the projection and identi cation of modes becoming less straightforward.
The projection of the third-order mode on SOM arrays has so far received only little attention. Here, when the circular mode is weaker than the leading two modes, it projects strongly only on a few types close to the center of the array; but some curvature of isolines is apparent in most types. This is in line with how the mode projects on the Sammon map of data-elds relatively strongly correlated with the pattern of the mode may appear anywhere in the phase space, but in the center of the Sammon map such cases are more common, and refer to elds in which both leading modes are close to their neutral phase. As a result, such elds do not comprise marked overall anomalies and occur, therefore, close to the center of The identi cation of the second and third modes in SOM arrays becomes particularly challenging if their strength is identical. Several different topology patterns were found here, each pointing to a particular limitation of SOMs. First, one of the modes may project relatively weakly onto one or a few types (typically in the center of the SOM) as if its variance was smaller than that of the other mode-the mode thus being underrepresented. Second, parts of the SOM array not occupied by patterns similar to the leading mode (e.g., two opposite corners) may represent one (any) phase of one mode each, the remaining phases being underrepresented. Third, one part of a SOM array (e.g., upper half) may predominantly show one particular combination of phases (positive or negative) of the two modes, the other (e.g., lower) half showing a different combination of phases, but not necessarily opposite to the upper half for both modes, the remaining combinations of phases being underrepresented. Each of these three topology patterns-if forced-may have speci c utility, except that the pattern itself is largely outside of researchers' control due to the unsupervised nature of SOMs. In real data, the underrepresentation of speci c combinations of phases of modes may be, but equally likely may not be, a consequence of these combinations occurring less frequently in the training dataset-here it was a result of only minor differences due to sampling. This illustrates the fact that one has to pay a considerable price for the topology preservation constraint of SOMs, relative to simple cluster analysis.
Last, it has to be stressed that in both the Euro-Atlantic and the Paci c-American regions at least four modes of variability are necessary to describe variations in winter circulation, summer circulation being even more complex (Barnston and Livezey 1987; Casado and Pastor 2012; Hynčica and Huth 2020).
Therefore, selective and uncontrollable underrepresentation of important modes of variability and their interactions will be an even more important issue in SOMs of real data.

On using SOMs to study the nonlinearity of teleconnections
We clearly show that the topology of SOMs strongly depends on the structure of data. Even if the leading two modes of variability project along elements of a SOM array (such as its diagonals), marked irregularities can be identi ed in the spatial patterns of types if the data comprise more than the two modes of variability. One way these irregularities demonstrate themselves in the spatial patterns of SOM types is a presence of non-mirrored patterns in opposite types within a SOM array, for example in opposite corners. Reusch et al. (2007) pointed out that such non-mirrored patterns are a feature that distinguishes SOMs from linear analyses (such as PCA). However, one needs to remember that such apparent nonlinearities do not necessarily point to nonlinearity of the underlying modes of variabilitynon-mirrored patterns in (any) two opposite positions within a SOM array clearly prevail even if data are linear. In the case of our synthetic datasets, these differences may be due to sampling, or simply be artifacts of projecting multidimensional data onto a two-dimensional phase space. We showed that especially lower order modes can project on SOM arrays in various ways, which, nevertheless, all lead to some kind of selective underrepresentation of co-occurrences of some phases of modes. Our data can be understood bivariate or trivariate-as regardless of the number of grid points, or input variables, all variability can be linked to only two or three modes. The fact that nonlinearity appears so clearly in SOMs of such simplistic datasets suggests that it will occur in any complex (multimodal, noisy) real-world data regardless of how well such data may be approximated by linear models. The tendency of nonlinear methods to detect nonlinearity where there is none has already been documented (Christiansen 2005), and one should, therefore, not forget the strong predisposition of results by the character of the used research method.

Outlook
The topology of SOMs relative to underlying modes of variability seems to be rather sensitive to data sampling. Here, we focused only on the topology; however, the response of the frequency of occurrence of individual SOM types to changes in the underlying modes may also be of interest, especially regarding  The dependence of quantization and topological errors on terminal radius (horizontal axes), SOM shape (columns) and dataset structure (rows). 20 variants of SOMs were de ned for each radius; variants selected for the study are highlighted by red crosses. The red box highlights three local optima of the topological structure (see text and     As in Fig. 7, except for (a) Z60-M20-C20, (b) Z50-M25-C25, and (c) Z40-M30-C30