**Conditional** log(j) **distributions as a function of potential**

Figure 1 displays one representative potentiodynamic polarisation (*j* Vs *E*) curve per experimental combination of [NaCl] and scan rate. First, as expected, the corrosion activity globally increased with chloride concentration. It is well known that the higher the [Cl], the higher the corrosion activity, while the slower the scan rate, the more time is given to Cl− to diffuse to the surface, initiating and propagating localised corrosion2,5,11. In a recent deep learning framework for uncovering compositional and environmental contributions to pitting resistance, chloride was found to be universally detrimental for passivating alloys62.

Noticeably, the magnitude of the *j* values recorded (~ 5x102-1x103 µA/cm²) were not comparable to those found for 316L in chloride media during the macro polarisation63,64, where the reported values were typically ~ 103-104 orders of magnitude lower. Despite the sparse literature on SECCM applied to corrosion, a few selected papers converted the corrosion *I* measured with the meniscus-cell to current density27,65. The following *j* ranges (µA/cm²) were reported for pure Al65, pure Mg65 and polycristalline zinc27 during potentiondynamic polarisation: ~1x102-1x103, ~5x103-5x104 and ~ 1x106-1x107. Thus, the *j* values here measured for 316L SS were comparable to those determined for pure Al, which is also considered a passive metal. Gateman et al. 65 observed that the absolute values of corrosion *j* obtained with she SECCM were much higher than the ones found with classical polarisation. This may be due to the fast scan rates employed66 and/or the effect of miniaturising the electrochemical cell67, where the concentration of reducing species within the electrolyte increases due to the change in the diffusion field’s geometry, thus increasing the anodic corrosion reaction rate.

To infer the complete distribution of the current density based on hundreds of curves, the frequency of *log(j)* as a function of applied anodic potential is presented in Fig. 2 for all datasets. In other words, each 3D plot in Fig. 2 offers the frequency histogram (%) of the conditional *log(j)* for each *E* interval, thus serving as probability surface maps.

These surface plots were coloured according to the rainbow spectrum, from violet to red (lower to higher overpolarisation regions, respectively). As the warmer the colour the higher the applied E, higher current density values tend to appear in red/orange/yellow in this representation. The mode value(s) of the conditional *log(j)* at each potential are highlighted by black markers.

At E ≈ 0.5 V, it could be seen that the *log(j)* distributions were approximately unimodal (mode at ~ 1x10² µA/cm²), regardless of the testing aggressiveness. These well-defined regions (highest frequencies among all surface plots) corresponded to the active (pre-passive) region. The distribution was more platykurtic (negative excess kurtosis) in 0.005 M NaCl (a) than in 0.01 M NaCl (b, c), which presented a positive skewness. In 0.05 M NaCl (d, e), the distributions were even more positively skewed, suggesting that higher [Cl] induced more positive *log(j)* values already at 0.5 V.

The medium overpolarisation regions (represented by the blue/green points in the 3D plots) correspond to the passivity domain (a region where the current density is independent of the applied potential2). In these E ranges, the conditional mode values of *log(j)* presented the least dispersion (jpass). Although these distributions were generally unimodal, increasing the testing aggressiveness (from a to e) leveraged the probability of collective outliers larger than jpass. This positive shift in the current resulted from pitting corrosion events, as expected within the passivity domain of 316L in chloride-containing media.

In the higher overpolarisation regions (yellow/orange/red colours), a sudden increase in current density was generally depicted, characterising the passivity breakdown (stable pitting formation)2. Simultaneously, the *log(j)* distributions became more right-tailed, tending to pass from unimodal to multimodal-like (number of black markers higher than 1) as a function of *E*. Both trends were more easily recognised as the testing aggressiveness increased (from a to e). For instance, at 0.05 M NaCl − 50 mV/s, the data spread towards higher *log(j)* occurred at similar frequencies (apparent uniform distribution).

## Normality tests

At potential values lower than the passivity breakdown, the conditional *log(j)* distributions were generally unimodal, but could normal distributions describe these data samples? Statistical normality tests for quantifying deviations from normality were deployed for this purpose.

The Anderson-Darling test was applied to the *log(j)* distributions at each surveyed potential. The results obtained from a surface plot mainly comprising unimodal distributions (0.005 M NaCl, Fig. 2 - a) are presented in Fig. 3. The normality plots of the other surface plots are accessible in the Supplementary Information (Fig. S1). As shown in Fig. 3, for a low significance level (α = 0.05), the test statistic values (grey dots) were systematically higher than the calculated critical value. Therefore, one concluded that the conditional *log(j)* could not be considered as normally distributed in the considered feature space (different combinations of scan rate/applied potential during potentiodynamic polarisation, and [Cl−]). Noticeably for 0.005 M NaCl, the distance from normality was relatively smaller in the pre-passive and passive regimes and progressively higher when entering the stable pitting domain. Although for a few sparse potentials, the null hypothesis was occasionally rejected (normal distributions observed in Fig. S1 c and d), the same overall conclusion held for all experimental configurations: the conditional *log(j)* was decisively not normally distributed.

## Fitting with theoretical functions

The conditional *log(j)* values were not normally distributed, but would other theoretical functions fit the data with high precision? In this quest, we resorted to *distfit*, a python package that allows for probability density fitting across 89 univariate distributions by the residual sum of squares (RSS) (*distfit* scores each of the different distributions for the fit with the empirical distribution and return the best scoring distribution)68. Figure 4 illustrates the approach to finding the best fitting function and corresponding RSS for the different E values (0.05 M NaCl, 100 mV/s). As the RSS is an absolute error measure, a “normalised” version of the metric (*log(j)* values normalised between 0 and 1) was preferred to account for the different data intervals as a function of *E*. In Fig. 4b - g, for illustrative purposes, the actual *log(j)* values (instead of the normalised values used in the RSS calculation) are presented. An identical approach was applied to all datasets, and resulting plots can be seen in Fig. S2.

As presented in Fig. 4 - a, various theoretical functions (left-y axis) yielded the best-fit score across the independent variable space, all with relatively low errors (right-y axis). As much as for the other datasets (Fig. S2), as a general rule, the obtained RSS values presented local maxima either at low or high potential regions and were relatively steady at potentials related to the passive regions. At lower potential regions, the RSS generally increased with the testing aggressiveness, likely reflective of higher (and more diverse) activities prior to the passive regimes. Moreover, the normal distribution did not appear among all best-fitting functions, while the uniform distribution frequently appeared at high overpotentials.

At high potential regions, the RSS rose sharply (compared to the RSS baseline representing the passive regions) in the two least aggressive conditions only. In the three more aggressive conditions, instead of monotonically increasing, there was a baseline shift/curve discontinuity marking the onset of a new regime (uniform distribution fitting the data at higher potentials).

The uniform distribution presents higher randomness than the deployed unimodal distributions according to the variance criterion (for finite supports, which refers to the range of values a distribution can achieve). The conditional *log(j)* distribution described by the uniform function in regions most likely related to pitting would account for the high unpredictability of pitting corrosion.

The correlation between the uniform distributions and higher potentials can be inferred from Fig. S3, where the RSS of the fitted curves (from all datasets) are plotted together. The uniform distribution was the best fitting function (red markers) at high potentials in 0.01 M NaCl (50 mV/s) and high/middle potentials in 0.05 M NaCl (100 and 50 mV/s).

Regardless of the dataset, two types of parametric distributions appeared as the best fitting function: unimodal and uniform. In Fig. 4 (from b to g), histograms of *log(j)* fitted by the selected functions are shown for representative potentials. Again, while the conditional *log(j)* distributions were unimodal at low and medium potentials (Fig. 4, b - e), they were reasonably uniform at high potentials (Fig. 4, f) at 0.05 M NaCl − 100 mV/s. The same trend could be observed for the other highly aggressive conditions (Fig. 6c and e), as opposed to the least aggressive ones (Fig. 6a and b). Figure 6 presents histograms of *log(j)* at fixed and extreme (low and high) potentials for all datasets (all in the same frequency and *log(j)* scales).

Figure S4 groups the outputted RSS when attempting data fitting with the uniform function in the entire E range for all datasets. It can be observed that the fitting errors were inversely correlated with potential, with acceptable RSS at high potentials only. At low/medium potentials, the unimodal type of distribution generally yielded the lowest fitting errors (function names in Figs. 4 – a and S2).

With potential, the frequency of *log(j)* values exceeding the mode increased, and the distributions gradually became more platykurtic and more positively skewed (Fig. 4, b - g). Overall, the ranges of conditional *log(j)* distribution considerably expanded with potential. This heteroscedasticity as a function of *E* was quantified by the conditional variance of (experimental) *log(j)* (Fig. 5). Another reason for computing the variance was that it is a measure of randomness or uncertainty. Noticeably, the data variance generally boosted with the testing aggressiveness, and this trend could also be visualised for the modelled data (Fig. 7, from a to e).

The fitted *log(j)* distributions of Fig. 7 are the probability surface maps of Fig. 2 modelled according to the best fitting functions (Figs. 4 – a and S2). In each 3D plot, unimodal curves were given the same colour, while uniform distributions were coloured red.

As observed in the experimental curves (Figs. 1 and 2), the current densities generally increased until reaching passivity, where relative stability preceded the passivity breakdown (Fig. 7). The exception being at 0.05 M NaCl − 50 mV/s (Fig. 7 - e), where passivity was hardly met; as indicated by the mode values (black markers) monotonically higher across the E range.

In terms of data dispersion, the distributions remained relatively constant for 0.005 and 0.01 M NaCl (both at 100 mV/s) while in passive condition (Figs. 7a, b). At high potential regions, however, while the mode of *log(j)* linearly increased as a function of *E* in 0.005 M NaCl, it presented a considerable fluctuation in 0.01 M NaCl (100 mV/s). The nonlinearity of the mode observed at high potentials for the latter was illustrative of the diversity of the current distribution when pitting was likely (justifying the implementation of various functions for proper fitting). Such data distribution complexity of 0.01 M NaCl (100 mV/s) can be further appraised in the 2D modelled surfaces (Fig. S5) (not all data points can be visualised in the 3D representations of Fig. 7).

Concerning the more aggressive conditions (Figs. 7 - c, d, e), the *log(j)* dispersion progressively raised from the pre-passive to the pitting regions (following the variance plot (Fig. 5)). In these cases, the considerably high data scatter at higher potentials, characterised by multimodal/amodal distributions, justifies the conditional *log(j)* described by uniform distributions (red curves).

In summary, the higher the corrosiveness, the higher the data dispersion (conditional variance), and the more likely the conditional *log(j)* was uniformly distributed at high potentials. Figure 6 indicates this trend of *log(j)* (typically presenting a unimodal distribution, such as at 0.5 V) potentially reaching a stochastic regime (characterised by the uniform distribution), such as at 1.35 V.

## Populations of experimental curves: local interpretation

Figures 8 to 12 show the entire population of the experimental curves for each CPP testing configuration. Inspired by Individual Conditional Expectation (ICE) plots (by displaying one line per instance that shows how the instance’s prediction changes when a feature changes, ICE plots aim at representing many curves that are often difficult to appreciate69), each *j* Vs *E* curve was coloured according to the distance between the sample and the conditional population mean (rainbow colourmap linearly spaced for *j*). Grey vertical lines were traced at potentials where the uniform function produced the best fit for the conditional *log(j)* distribution (Figs. 10, 11 and 12). It could be seen that the mean curves were not of much value for describing entire datasets, especially in the cases where both types of distribution (unimodal and uniform) were present.

These experimental curves confirmed that the considerable data scattering observed at increased potentials was strongly caused by pitting corrosion events. One could follow the increased complexity in the *j* distributions (accompanied by the increased conditional variances, Fig. 5) with the testing aggressiveness. In particular, the discontinuous profile of the conditional *log(j)* distributions observed for 0.01 M NaCl (100 Mv/s) at high potentials (Figs. 7 – b and S5) could be attributed to current spikes (pitting) from individual *j* Vs *E* curves, as shown in Fig. 9. The diversity in terms of current spike profiles was most likely related to pits differing in size, number, location (concerning the heterogeneous sample surface) and position (concerning the droplet-confined region).

One can see that when the conditional *log(j)* was uniformly distributed at high potentials (Figs. 10 and 11), nearly the entire data population had reached stable pitting. However, in less aggressive conditions (Figs. 8 and 9), although the totality of curves also eventually reached stable pitting (at *E* = ~ 1.25 V), the *log(j)* values were never considered uniformly distributed.

The local interpretation of curves suggested that stable pitting solely was not sufficient for triggering the stochastic behaviour of *log(j)*. From a global interpretation perspective (Fig. 5), a “conditional variance threshold” would also be necessary for the *log(j)* distribution having high randomness. One can observe that the variance values for 0.01 M NaCl (50 mV/s) and 0.05 M NaCl (100 mV/s) were significantly larger than for 0.005 M NaCl and 0.01 M NaCl (100 mV/s) at the stable pitting region. Specifically, it seemed that a conditional variance of *log(j)* higher than 9 would thus be correlated with the stochastic behaviour.

An exception to this “variance threshold” would be the most aggressive condition case (0.05 M NaCl, 50 mV/s), in which the uniform distribution kicked in at much lower potentials (Fig. 12). In there, not only the highly random behaviour occurred with variance values significantly lower than 9 (Fig. 5), but only a minority of the curves population displayed stable pitting behaviour at the concerned potentials.

The three data clusters distinguishable in Fig. 12 might explain the overall stochastic behaviour of this dataset. These three groups were defined by: passive-like behaviour regardless of the E, stable pitting for E > ~ 1.15 V and stable pitting at much lower potentials (starting from E = 0.5 V). The origin of this three-clustered behaviour is unknown to date and will be addressed in the future with approaches for identical location correlation of SECCM footprints with surface analysis. As the 316L sample surface was randomly probed multiple times, the differences observed among the datasets were considered significantly different. In other words, particular surface features (grains, grain boundaries, inclusions, etc2,5), determining the three-fold behaviour observed at 0.05 M NaCl (50 Mv/s), had the same likelihood of being scanned regardless of the testing configuration. Therefore, a probable explanation would be that particular surface features would be more or less active depending on the experiment’s corrosiveness. Higher testing aggressiveness would trigger more diverse (higher variance) electrochemical responses, and the outcomes of Fig. 12 would be an extreme case in this sense. Corroborating this hypothesis, careful inspection of the following most aggressive tests revealed additional behaviour clusters similar to the extreme case (Fig. 12): stable pitting at much lower potentials (Figs. 10 and 11); and the few curves remaining passive-like throughout the *E* range (Fig. 11). On the contrary, in the least aggressive conditions (Figs. 8 and 9), only one cluster of behaviour typical of the 316L polarisation was observed, implying that the electrochemical responses were overall lesser sensitive to local surface variabilities.

In conclusion, the more aggressive the testing condition, the higher: 1. the conditional variance (a measure of randomness) of the *log(j)* distributions – in general, and especially in the stable pitting region); 2. The likelihood of uniform distribution (high randomness) related to stable pitting behaviour (combination of [Cl−] and scan rate at least as aggressive as 0.01 M and 50 Mv/s, respectively); 3. The conditional variance of the uniform distribution.