The Signiﬁcance of Neural Inter-Frequency Correlations

It is of great interest in neuroscience to determine what frequency bands in the brain contain common information. However, to date, a comprehensive statistical approach to this question has been lacking. As such, this work presents a novel statistical signiﬁcance test for correlated power across frequency bands in non-stationary time series. The test accounts for biases that often go untreated in time-frequency analysis, i.e. intra-frequency autocorrelation, inter-frequency non-dyadicity, and multiple testing under dependency. It is used to test all of the inter-frequency correlations between 0.2 and 8500 Hz in continuous intracortical extracellular neural recordings, using a very large, publicly available dataset. The results show that neural processes have signatures across a very broad range of frequency bands. In particular, LFP frequency bands as low as 20 Hz were found to almost always be signiﬁcantly correlated to kHz frequency ranges. This test also has applications in a broad range of ﬁelds, e.g. biological signal processing, economics, ﬁnance, climatology, etc. It is useful whenever one wants to robustly determine whether short-term components in a signal are robustly related to long-term trends, or what frequencies contain common information.


Previous Work on Neural Inter-Frequency Relationships
Despite significant research, it is unclear in the literature what frequency bands in neural signals contain common information. This is an important finding, as it has consequences for many subfields of neuroscience. These include modelling, understanding biological mechanisms, computational neuroscience, functional connectivity, the compression of neural signals in Brain-Machine Interfaces (BMI), and many more. Many studies have looked at inter-frequency relationships in neural signals. In general, they belong to one of two classes. The first consists of indirect decoding methods. The second involves directly comparing neural features.

Class 1: Indirect decoding methods
In this class, various neural time-frequency features are compared in their ability to decode various behaviours/stimuli. This class primarily serves to inform us about useful neural features for decoding. However, it has also historically stood as the basis for determining whether neural features contain common information.
Unless care is taken, the results from Class 1 are susceptible to misinterpretation. Orthogonal features may have similar decoding ability on even specific tasks, despite the features containing no common information. As such, methodologically, the comparative decoding ability of different features does not always

The Contribution of this Work
To get a systematic understanding of inter-frequency relationships, one needs to robustly analyse all present inter-frequency relationships simultaneously in a dataset. This makes it easier to compare results across different studies. It also allows us to discover all of the interesting inter-frequency relationships in the dataset. As such, it prevents the literature from becoming biased towards a narrower selection of relationships that are a priori believed to be potentially interesting and are investigated individually.
In this work, we propose a novel Monte Carlo (MC) statistical significance test for the multiple testing of correlated frequency bands in non-stationary time-series. This test controls the False Discovery Rate under a wide range of dependencies, and accounts for oversampling of the frequency domain and intra-frequency autocorrelation. Specifically, we propose a statistical significance test for inter-scale correlations in the Wavelet Power Spectrum (WPS) of the Continuous Wavelet Transform (CWT). This method applies to any non-stationary time series, under certain dependency conditions [46]. This method can also be generalised to other non-orthogonal Time-Frequency Distributions (TFD), e.g. the Spectrogram, and different measures of association, given certain normality constraints under the null.
We use this test to determine statistically significant inter-frequency correlations in broadband intracortical neural signals from Macaque M1 cortex. The data is from a publicly available dataset [47]. The bulk of this work was done using MATLAB R2020b. Most of the computations were done on the Imperial College London High Performance Computing cluster. All results and code from this work have been made publicly available at Zenodo [48] and Github [49], respectively.
It warrants mentioning that [50] proposed an analytic method for determining the significance of interfrequency correlations, used in that work for testing EEG data. The method assumes that the data fits a modulated cyclostationary model, which is a specific subset of non-stationarity signals. As such, in cases of modulated cyclostationary data with strong harmonics, the author recommends that the method from [50] be used due to its analytical nature. For more general classes of non-stationarity, the method proposed in this work will work due to the lack of assumptions it makes about the data.

Choice of Time-Frequency Decomposition
Neural signals are non-stationary, and so classic Fourier decomposition is insufficient for their analysis. TFDs can be used to analyse how the power in different frequency bands in a signal changes over time. This allows one to determine what frequency bands contain common information.

Wavelet Decomposition
A wavelet function, i.e. wavelet, is any function ψ L 2 (R) of zero average, ||ψ|| = 1, and centered at t = 0 [71]. By scaling, i.e. stretching, the mother wavelet ψ by a positive quantity a and translating it by τ (R), we define a family of time-frequency atoms, ψ a,τ as: Decomposing the time series signal along the time-frequency atoms produces a TFD. Scales correspond to frequencies and the translation gives us insight into the time domain [71]. a {a N + | 1 ≤ a ≤ u} is the scale index, where u is the number of analysed scales, i.e. frequency bands.

Choice of the Continuous Wavelet Transform
The details of the WT decomposition depend on the orthogonality and scale 'dyadicity' of the WT in question [70]. Non-orthogonal WTs oversample the time domain, meaning that adjacent coefficients in the same scale can sample some of the same data. A non-orthogonal wavelet can be visualised as a window that moves one sample each time, and uses the windowed samples to calculate a coefficient. Wider windows produce more autocorrelation between adjacent coefficients, as they largely sample the same data. As such, non-orthogonal WTs produce intra-scale autocorrelation, and the wider the scale the more autocorrelation. Inversely, orthogonal WTs optimally sample the time domain, so that the autocorrelation between adjacent intra-scale coefficients is approximately zero [70,72,73]. Therefore, in orthogonal WTs, wider scales produce fewer coefficients than narrower scales. This is because there is no overlap between the time samples contributing to each coefficient within each scale. E.g., if a scale was equal in width to half the signal length, an orthogonal sampling would only produce 2 samples at that scale.
Scale non-dyadicity means that adjacent scales overlap somewhat. In other words, the frequency domain is over-sampled. This creates an intrinsic correlation between nearby scales regardless of the input signal. This is sometimes referred to as redundancy between scales. Dyadicity means that there is no overlap between scales: the frequency domain is optimally sampled.
The Continuous Wavelet Transform (CWT) uses non-orthogonal wavelets and non-dyadic scales, meaning it oversamples both time and frequency domains. The Discrete Wavelet Transform (DWT) uses orthogonal wavelets and dyadic scales, meaning it optimally samples both domains. The Maximal Overlap DWT (MODWT), i.e. Stationary Wavelet Transform (SWT), uses non-orthogonal wavelets and dyadic scales [51,70], meaning it optimally samples the frequency domain but oversamples the time domain.
In our application, non-orthogonality is a requirement. This is because calculating the Inter-Scale Correlation (ISC) requires an equal number of coefficients per scale. As such, only the CWT and MODWT are appropriate. However, the cost of non-orthogonality is significant intra-scale autocorrelation that must be accounted for during the statistical testing [68,69].
Between the CWT and MODWT, one would expect the MODWT to be more convenient for testing ISCs, due to the dyadicity. However, based on our observation of MODWT and CWT inter-scale correlation matrices of intracortical neural data, the frequency over-sampling of the CWT appears very valuable in terms of both interpretability and detail. As such, this work presents an inter-scale correlation statistical significance test for CWT. The resulting non-dyadicity must similarly be accounted for.

Measuring Inter-Scale Correlations
To measure the CWT Inter-Scale Correlations (ISC) in a signal, first one first takes the CWT of the analysed signal. Then one obtains the WPS by taking the point-wise square of the CWT's absolute value. In this work, the neural WPS is denoted as S, where individual scales are denoted as s a , again where a{a N + | 1 ≤ a ≤ u} is the scale index, and u is the number of analysed scales. Next, one takes the time-wise correlation between WPS scales s a and s b , for all 1 ≤ a, b ≤ u pairs. This gives the Inter-Scale Correlation Matrix (ISCM) of the analysed signal. In this work, the Morse wavelet is used. As such, the ISCM can be interpreted as the inter-frequency correlation matrix, due to the unique mapping of scale to frequency for the Morse wavelet [74]. Taking the ISCM effectively tells one how correlated the power in different frequency bands is, given the CWT decomposition. This process is extensively discussed in the Methods section.

Multiple Testing of Correlation Matrices
When performing multiple tests simultaneously, taking the results to be statistically significant based on their unadjusted marginal p-values creates a large risk of rejecting true null hypotheses [45]. This is generally solved by controlling the expected number of falsely rejected hypotheses, the FDR, at some proportion α, with 0 ≤ α ≤ 1 [75]. [76] pioneered work in demonstrating that strong dependencies between test statistics, such as those intrinsic to a correlation matrix, can significantly degrade the effectiveness of many FDR procedures. In [46], Cai and Liu proposed a method for testing each element of a correlation matrix that successfully controls the FDR under a wide range of dependencies. This method has 2 parts. The first step transforms the to-be-tested Pearson correlation values r a,b into test statistics, T a,b . T a,b needs to be asymptotically normally distributed under the null. The second step performs multiple testing on T a,b while controlling the multiplicity under dependency.
For this work, we used MC processes to produce normal, null hypothesis, ISC distributions. We then tested which elements of the neural ISCMs were significantly different from the null, using the second step from [46]. We were not able to use the test statistic proposed in [46]. This was due to the significant intra-scale autocorrelation in our signals, which violated i.i.d assumptions. Other kurtosis assumptions were also violated.

Our Custom Monte Carlo Test Statistic
When testing for significant ISCs between non-orthogonal, non-dyadic scales in non-stationary signals, three sources of spurious correlation need to be accounted for. The first is the ISC due to the non-dyadicity. This is produced by the TFD, independent of the input signal. This is where adjacent scales partially sample the same frequencies, which creates an intrinsic correlation between the adjacent scales, discussed earlier in Section 1.3.3. The second source of bias is the intra-scale autocorrelation introduced by the TFD's nonorthogonality. This is where adjacent WPS coefficients in the same scale can be significantly autocorrelated, also discussed in Section 1.3.3. If autocorrelation is not accounted for in determining the significance of crosscorrelations, this can produce falsely significant results [8,[38][39][40][41][42][43][44]. The third source of bias is the intra-scale autocorrelation due to 2 nd order properties of the non-stationary signal. E.g. in a non-stationary signal, the autocorrelation structure of the power in a frequency band may be significantly different from that predicted by white noise input to the TFD. This can be observed in Fig. 1.
Our null distribution, which accounts for all three sources of spurious ISC, is the sum of 2 elements. The first is the mean intrinsic CWT Morse wavelet ISCM from stationary white noise input. With white noise input, the power in frequency bands is non-autocorrelated. As such, any measured ISC will be due to the other two sources of bias. The first is the non-dyadicity. The second is random chance when taking the cross-correlation between two autocorrelated sequences, where the autocorrelation is due only to the CWT non-orthogonality. However, the mean cross-correlation between two separately autocorrelated, otherwise random sequences is zero. As such, across many white noise processes, the mean white noise ISCM estimates the contribution of only the non-dyadic samping. This is a 'distorting' contribution of the CWT, independent of the input signal. An example of the mean ISCM from white noise processes of length 500 s can be seen in the Supplemental Material, Fig.13 (a-b).
The second element is a normal distribution, and depends on the input signal. It is obtained by taking the neural WPS S, and phase randomising each scale s a independently. Phase-randomisation is a common method to find the significance of correlations between autocorrelated time-series [8,39,[41][42][43][44]. The Fourier Transform (FT) phase randomisation method was used. The choice of the FT method is discussed in the Supplemental Material (Section 6.2), along with implementation details. We then calculated the ISCM of the phase-randomised WPS S. We repeated this multiple times for each recording in a MC simulation. This distribution of ISCs across MC iterations originates purely from intra-scale autocorrelation. This intrascale autocorrelation is due to both the CWT non-orthogonality and the non-stationarity of the signal. The phase-randomisation destroys the ISC due to non-dyadicity, as any shared inter-scale phase relation is randomised.
For a neural ISC to be significant, it needs to be statistically unlikely for it to originate from the combination of the three sources. Therefore, we test whether the neural ISC is larger than that predicted by the sum of the two MC-generated elements. As such, any significant deviation from the null statistically represents a genuine trait of the tested signal, rather than a spurious contribution of the CWT and signal intra-frequency autocorrelation. See Methods Section 4.4 for details on the production of the null distributions, and on multiple testing using the method from [46]. The motivation for adding the two elements to create the null distribution is also discussed further in the Supplemental Material (Section 6.4).

Results
We calculated and tested the significance of the ISCMs for 96 channels for 15 sessions each, for a total of 1440 broadband intracortical recordings (see Section 4.1). However, to avoid data-dredging any future work, we analysed only a subset of the results, i.e. channels 1-20, in this work. The complete set of ISCMs r for all 96 channels and their associated figures are publicly available at [48], and the associated code at [49]. The full set of results analysed in this work include 300 ISCMs, and so are not all shown individually. A subset of the statistically significant elements of the ISCMs are shown in Fig. 2 We define low frequencies as 0.1-10 Hz, medium frequencies as 10-100 Hz, and high frequencies as 100-7500 Hz. Individually for all recordings, we controlled the FDR at an α of 0.01.

Neural processes reliably have signatures in frequencies as low as the beta band
The most interesting novel result is that high frequencies are almost always correlated to high and medium frequencies.
In particular, even frequencies as low as 20 Hz, which correspond to the beta band, are almost always correlated to kHz range frequencies. It has been considered before that various LFP bands are correlated to spikes. However, this works robustly shows for the first time that LFP frequency bands are correlated to spikes in a continuous range down to 20 Hz. For individual recordings, this can be observed    Fig. 4 (b,f) it can be seen that over 75% of analysed recordings had the same significant mid-high frequency correlations, including the beta band. Generally, these results show that neural processes have signatures across a very broad range of frequency bands. This opens up an interesting opportunity to explore the biological mechanisms behind these signatures, and to explore relatively low-bandwidth encodings of neural information. This is discussed further in Sections 3.3 and 3.4, and will be the subject of future work. The significant correlations are quite low, but this is to be expected when sampling potentially spare neural processes at a high sampling resolution. There may not be anything interesting happening in the vicinity of electrodes a large amount of the time.
It warrants mentioning that there is a noticeable lack of significant correlations to frequencies in the 55-65 Hz range ( Fig. 4 (d,f)). This is due to the pre-processing, which involved randomising the phases in that range (Supplemental Material, Section 6.5).

Few low-high frequency correlations
There were few low-high frequency correlations in the results. We compared the amount of significant lowhigh ISCs to the total amount of tested low-high ISCs for each recording. The graphical intuition is that we tested how many of the elements in the top left corner of the ISCMs were significant for each recording. This is shown in Fig. 5, a histogram of the number of analysed recordings that achieved different ratios of significance relative to all tested low-high relationships. The vast majority of the analysed recordings achieved ratios that were near zero. This shows that significant low-high frequency correlations are quite rare across the analysed recordings.
However, this is in direct disagreement with other studies. In [77] they found that spike rates could be reliably estimated from sub-4 Hz LFP time-domain features. In the study, the autocorrelation of the data  was accounted for. As such, the simplest explanation for the difference between our results and those of [77] is that the datasets in question are different. This study used Utah arrays and CSD referencing, while [77] used microwire electrodes and no referencing. As such, it warrants re-mentioning that different datasets and feature extraction methods will likely produce different results. This suggests that future work should involve comparing such different datasets and feature extraction methods.

The vast majority of significant inter-scale correlations were positive
Interestingly, the vast majority of significant ISCs were positive (e.g. Fig 2 & 3). I.e., when power went up in one frequency band, it generally went up in other correlated frequency bands, instead of down. This can be observed in Fig. 6, which shows that the vast majority of significant correlations (88.4%) were positive.

Time-domain artifacts create strong low-mid frequency correlations
Across all recordings, the comparison of time series and ISC plots showed that signals with strong timedomain artifacts consistently had strong low-mid ISCs (Fig. 7). Similarly, neural time series without strong artifacts consistently did not have strong low-mid ISCs. There were 13 recordings that clearly had abnormally strong low-mid ISCs, such as those shown in Fig. 7 (b, d). Of those 13, 10 were easily identifiable (e.g. Fig. 7 (a, c)) based on observation of the time series in the author's first attempt. As such, it seems likely that strong low-mid ISCs result from recording artifacts and/or pathological activity.

The neural signatures are chronically robust
These intra-channel relationships were generally quite robust across different recording sessions, i.e. across months. This is encouraging, as it suggests that the same neural processes are being recorded and are stable across long periods in this dataset 3 Discussion

Significance of this Work
There are only three possible sources of significant correlation in the power of different frequency bands in a neural signal. These are random chance, non-neural sources e.g. electronics noise, and neural sources. In this work, pre-processing is used to remove non-neural sources. Then, a statistical significance test is used to determine how unlikely the observed result is to be due to random chance. As such, this work gives us the first ever method to discover statistically significant ISCs, due to neural sources, in broadband intracortical neural recordings.
However, the results of this work may still be contaminated by non-neural sources, potentially even biological sources. This can only be corrected by perfect pre-processing. This likely requires a strong familiarity with the neural recording set-up. As such, any inter-frequency analysis would benefit from an integrated understanding of how the data was gathered. Researchers are free to select from the publicly available results [48] to determine what constitutes non-neural activity, based on their knowledge of the publicly available data [78]. Plots of the Fourier transforms and time-domain signals, for each channel and session, are also provided at [48].
The statistical significance test can also be used for other applications, e.g. biological signal processing, economics, finance, climatology, etc. It is appropriate for any application where one wishes to see if longterm trends are related to shorter-term components in non-stationary time series, or what frequency bands contain common information.

Possible Improvements to the Statistical Test
Improvements to this test can likely be made. Accounting for lag is absent from the current numerical method, due to the computational expense. This is due to the massive amount of combinations of lag for different frequencies that are possible. Another obstacle is the difficulty of the multiple testing given multiple lag permutations. As such, all tested relationships in this work are instantaneous.
While well-understood and computationally simple, the Pearson correlation coefficient is limited to finding linear, 1 st order polynomial relationships. In this work, the Pearson correlation coefficient was used as the measure of association because its statistical significance characteristics are well understood. However, including other state-of-the-art association measures, capable of finding more general non-linear, non-functional relationships [79], is an interesting avenue for future research. The measure of association must however be normal under the null for the method in [46] to be appropriate.

Do only correlated frequency bands contain interesting information?
With sufficiently good pre-processing, only neural sources and random chance, controlled at some level, contribute to the significant elements of the ISCM. Here we will argue that if the neural process is not represented in the significant elements of the ISCM, it may be that it is not statistically measurable.
Assume x is a neural recording measuring biological, presumably neural, processes. In the TFD of x, assuming good enough time-frequency resolution and that the recorded processes have a consistent signature distinguishable from noise, we should see roughly the same signatures in the same frequencies whenever the same biological process occurs under the same conditions. We also assume that biological processes have signatures in multiple/broad frequency bands. The alternative would be a pure sinusoid electronic signature, which seems biologically implausible.
In a long enough recording, if 2 otherwise random signals are synchronized often enough, they will become statistically significantly correlated relative to noise with no synchronization. As such, we can say that in a long enough recording x, if there are biological processes 1) whose signatures are distinguishable from noise, and 2) which occur often enough, then in a TFD of x, the frequency bands that contain the signatures of the processes will become statistically correlated relative to random chance.
The author's hypothesis is that only the statistically correlated frequency bands contain interesting neural information. This is because virtually all measurable biological processes presumably produce, in long enough recordings, statistical correlations between the frequency bands of their signatures. If there are no significant inter-frequency correlations for a given frequency band, then there is no biological process indistinguishable from noise that has that frequency band as a signature. This is because that process would likely also have other frequency bands as its signatures, which would produce a correlation between the signature frequencies. Therefore, looking at the statistically significant inter-frequency correlations should allow us to identify the subset of frequency bands that contain biological information. It may even allow us to rank the frequency bands by informativeness without having to do any decoding to an external metric, e.g. behavior. This has never been achieved before. Caveats to this claim are lag and non-linearity in the inter-frequency relationship as discussed in Section 3.2.

Future work
Future work will include a Feature of Importance analysis to test the hypothesis in Section 4.4. We will test whether the frequency bands with few statistically significant ISCs have significantly worse behavioral decoding performance than those with more. This will allow us to answer whether significant ISCs, along with pre-processing, are a reliable method for extracting the frequency-signatures of neural processes. If successful, this may open the door to a state-of-the-art encoding of neural information based on ISCs.
the Primary Motor Cortex M1. The subject 'Indy' was an adult male Rhesus macaque monkey, performing a free reaching task.
The recordings are across 30 sessions. They are not segmented into trials, and have an average duration of ∼520 seconds. As such, they enable us to reliably analyse low-frequency aspects of the neural signals. In this work, we analysed the first 15 sessions in chronological order, from 2016-06-24-03 to 2016-10-17-02. These had an average length of ∼480 s. Only channels 1 to 20 were analysed, so that future work on these sessions is not data dredged.

Dataset Frequency Characteristics
In this dataset, the broadband data was pre-amplified with a PZ2 Preamplifier (Tucker-Davis Technologies, Alachua, FL) with a frequency response of 3 dB: 0.35 Hz -7.5 kHz; 6 dB: 0.2 Hz -8.5 kHz [80]. There was also an anti-aliasing filter built-in to the pre-amplifier: 4 th order low-pass with a roll-off of 24 dB per octave at 7.5 kHz. It was then sampled at F s = 24414.0625 Hz and 16-bit resolution using a RZ2 BioAmp Processor (Tucker-Davis Technologies, Alachua, FL). There is an anti-aliasing filter built-in to the recording amplifier: 2 nd order low-pass with a roll-off of 12 dB per octave at 7.5 kHz. As such, the recording configuration allows us to observe inter-frequency correlations between ∼0.2-7500 Hz.

Spectral Interpolation and Phase Randomisation for Line Noise Removal
There are three main sources of noise in neural recordings: electrode impedance thermal noise, tissue thermal noise and interface electronics noise [81]. Of the three, only interface electronics noise is likely to contain correlated power across different frequency bands. This is the aspect of noise that will impact our results the most. A clear example of interface electronics noise that produces correlated power in different frequency bands is line noise and its harmonics.
In this work, we used an adjusted Spectral Interpolation (SI) method to remove the line noise [82]. This avoided the distortion associated with commonly used notch filters, e.g. Gibbs ringing. Since we are particularly concerned with inter-frequency correlations, we differed from [82] in that we also phaserandomised the interpolated frequencies. Using the SI method, we removed each line noise harmonic up to the Nyquist frequency, as well as a few other noticeable components. The technical details are given in the Supplemental Material (Section 6.5).

Current Source Density Referencing of Electrodes
Neural recordings can contain contributions from distant neural activity [29,83]. Therefore, to bias the recording towards local processes, the spectrally interpolated signals were referenced using Current Source Density (CSD) referencing, i.e. Laplacian referencing [84,85]. The technical details of the CSD procedure are given in the Supplemental Material (Section 6.6).

Standardizing the Length of the Recordings
The statistical significance test requires that many white noise processes are produced. These must be close to the same length as the analysed signal. To reduce the total amount of MC simulations in this work, the CSD-referenced neural recordings were shortened to a standard length. If the signal was between 350 and 400 s in length, it was shortened to 350 s. If it was between 400 and 500 s in length, it was shortened to 400 s. If it was above 500 s, it was shortened to 500 s. This was done by removing samples from the end of x so that n = F s × 350, n = F s × 400 or n = F s × 500.

Pre-whitening of Data
Pre-whitening is common in signal analysis, used to remove autocorrelations [38,39,41]. This makes the signal statistically behave as white noise, simplifying many tests. In our application, pre-whitening has the benefit of enabling the use of white noise as a phase-randomised version of each channel [8,38,39,[41][42][43][44]. This dramatically reduced and simplified the Monte Carlo simulations. This has no effect on the phase relationships between frequencies in the neural data, which are what are being investigated. The prewhitening was preceded by clipping the data to prevent edge effects. The technical details are given in the Supplemental Material (Section 6.7).

Continuous Wavelet Decomposition and the Morse Wavelet
Let x {x k R | k N + | 1 ≤ k ≤ n} be a discrete time series with equal time spacing ∆t, where n is the number of samples. For the CWT on a discrete sequence [60,65], the decomposition coefficients W (a, τ ) are obtained by decomposing x over scaled and translated version of the mother wavelet function ψ: where a {a N + | 1 ≤ a ≤ u} denotes the scale, and u is the number of scales in the CWT decomposition. τ {τ N + | 1 ≤ τ ≤ n} denotes the translation. * denotes complex conjugation. For our CWT wavelet, we selected the continuous and analytic Morse wavelet at γ = 3, known as the Airy wavelet. It has a number of desirable properties, e.g. almost minimal Heisenberg area, unique mapping of scale to frequency, etc. [74]. However, when looking at common information between frequency bands in terms of the power, we are uniquely interested in wavelet power spectra statistics. As such, the results will be similar across wavelets. This is because the choice of wavelet is generally not critical for wavelet power spectra analysis [68].
The decomposition coefficients for the Morse wavelet [86] are given by: where the order B controls the low-frequency behaviour, and the family γ controls the high-frequency decay.
In this work, we used the default MATLAB value of B = 20. For more details on the Morse wavelet, see [87,88]. Being an analytic wavelet, it produces complex coefficients. To get the WPS coefficients S, one takes the element-wise square of the absolute value of the complex wavelet coefficients [68]: This gives one the estimated power at each scale and moment in time. In this work, scale is taken to be a frequency band, or the time-wise vector of WPS coefficients at the given scale index. Frequency is taken to be a frequency, many of which can be sampled in a scale.

COI
The CWT has a feature called the Cone of Influence (COI). This refers to the zone of the WPS where edge effects are not present and the coefficients can be relied upon to be accurate. Outside the COI, the stretched wavelet contains the contributions of padded samples outside x. Wider scales use more time-samples in their calculation than narrower scales. Therefore, the zone in which wider scales extend beyond x is larger than for narrower scales. A visual example of the COI is shown in Fig. 9 (b).

Pearson Correlation Coefficient
Let s a denote the vector containing the coefficients of matrix S from scale index a, s a = s a,1 , ... , s a,k . The inter-scale Pearson correlation r a,b , i.e. ISC, between two WPS scales s a and s b is given by: wheres a = 1 n n k=1 s a,k , and {a, b N + | 1 ≤ a, b ≤ u}. R denotes the full matrix of ISCs, i.e. the ISCM, with Only samples within the COI are reliable. As such, WPS scales for which less than 90% of the samples were within the COI were removed. Then, samples were removed from the beginning and end of the WPS so that, at any remaining scale, no sample was within the COI. As such, each ISC was calculated using only  samples that were within the COI for both scales. This method was thousands of times more computationally effective than taking the pairwise correlation between all samples within the COI. For all recordings, the remaining scales included 0.11 to 8500 Hz.

Data post-processing summary
The data post-processing involved CWT-decomposing the pre-processed data with equation (3), extracting the Wavelet Power Spectrum coefficients S with equation (4), and finally calculating the ISCM R via equation (5) and as described in Section 4.3.3. The complete data flow for pre-and post-processing of the neural recordings is given in Fig. 8. An example of a neural data ISCM and associated WPS is shown in Fig. 9 (a-b).

Estimating the Mean White Noise Inter-Scale Correlation Matrix
To generate the first null element in our test statistic, i.e the mean white noise ISCM, we performed a MC simulation. This produced L = 1000 of random white noise signals for each standard recording length (i.e. 350, 400 and 500 s). The generation of the white noise signals is discussed in the Supplemental Material (Section 6.8). For each MC white noise signal, we followed the same pre-and post-processing steps as for the neural CSD-referenced and shortened data. Firstly, the end of the sequence was clipped. Secondly, the sequence was pre-whitened to ensure a completely identical and flat Fourier spectrum. Then we performed the CWT, and calculated the random white-noise WPS coefficients Q = (q v,a,k ) L×u×n . Let q v,a denote the vector containing the coefficients of matrix Q from scale index a and MC iteration v, q v,a = q v,a,1 , ... , q v,a,k . The inter-scale Pearson correlation ρ v,a,b between two scales q v,a and q v,b , from MC iteration whereq v,a = 1 n n k=1 (q v,a,k ) denotes the mean of q v,a . n depends on whether the MC process is 350, 400 or 500 s long, and on the clipping in MC iteration v. The combined data flow for the CWT-contribution MC is shown in Fig. 10. The mean ISCM was then taken for each standardised recording length:ρ a,b = L v=1 ρ v,a,b /L. As an example, Fig. 13 (a-b) (Supplemental Material, Section 6.4) gives the mean MC-derived ISCM for white noise processes of length 500 s.

Generating the Normal Distribution due to Intra-Scale Autocorrelation
To generate the second null distribution, for each recording WPS S, for each scale s a , the phases were FT randomised for each MC iteration h {h N + | 1 ≤ h ≤ H}, H ≈ 250. This was a sufficient number of iterations to reliably determining the mean and standard deviation of the normal distribution. We denote the phase randomised version of s a in MC iteration h as µ h,a . The inter-scale Pearson correlation φ h,a,b between two scales µ h,a and µ h,b is given by: whereμ h,a is the mean of µ h,a .
There are H elements in φ a,b for each (a, b), where φ a,b = φ 1,a,b , ... , φ H,a,b . Observation (Supplemental Material, Fig. 12 (a-c)), followed by normality testing (Supplemental Material, Fig. 12 (d-i)), of the histograms of the φ a,b vectors revealed that the vast, vast majority were normally distributed at an FDR α of 0.05. As such, they were compatible with the multiple testing procedure from [46].

Summing the Elements to Produce the Null Distribution
For each recording, the two elements were added together to form the final null distribution, χ a,b : distribution in question depends on the standardised length of the analysed recording. This is because in practice there were 3 different versions of ρ a,b : one for each standard MC signal length (350, 400, 500 s).
The test statistic T a,b is then obtained by standardizing r a,b by the mean and standard deviation of the final normal null distribution χ a,b ,

Multiple Testing of Test Statistics under Dependency
The second part of the method in [46] involves performing multiple testing on the asymptotically normal test statistics. We test which elements of T a,b are significantly different from the standard normal distribution. The procedure is as follows [46]: 1. Calculate the test statistics T a,b for 1 ≤ a < b ≤ u, in our work via the MC method above.
2. Obtain the thresholdt, above which we reject the hypotheses. To do so, first we calculate how many hypotheses T are rejected, R(t), for threshold value t, 0 ≤ t ≤ d u , where d u = 4 log u − 2 log(log u).
where I {·} denotes the indicator function. Second, we calculate the thresholdt.
where inf denotes the infimum, and G(t) = 2−2Φ(t), where Φ(t) is the cumulative distribution function of the standard normal distribution. If equation (11) does not produce a value fort, sett = 2 √ log u.
As such, we reject all H o,a,b with associated |T a,b | ≥t. This indicates that, when the FDR is controlled at α under dependency, the rejected inter-scale relationships are statistically significant.

Acknowledgements
This work was funded by a EPSRC DTP scholarship. Many thanks to Timothy G. Constandinou for his PhD supervision, and to Joseph O'Doherty and the Sabes lab for making their data publicly available [78]. Many thanks to Rajendra Bhansali and Sofia Olhede for feedback and discussion. Thanks to Melissa O'Neill, Brittany Scheid, Alexandros Leontitsis, Erol Kalkan, Yair Altman et al., Ipek et al., and David Groppe for respectively making their PCG random number generation [89], FT phase randomisation [90], IAAFT phase randomisation [91], pre-whitening [92], figure-saving [93], normality testing [94], and BH-method FDR testing [95] code publicly available.
The author states that they have no competing interests. Correspondence and material requests should be addressed to oscar.savolainen.96@gmail.com.
6 Supplemental Material

Code and Full Results
The code used throughout this work is available at [49]. The full results, including the ISCMs, the MCproduced null distributions, and the SI-CSD processed neural data time and Fourier domain plots are all available at [48].

Phase Randomisation Implementation
Phase-randomisation is used to create surrogate signals that have the same statistical properties as the original signal, but are otherwise random. It is commonly performed to determine the significance of the cross-correlation between two autocorrelated signals [8,39,[41][42][43][44]. There are multiple methods for phaserandomization. The 2 most common methods, for stationary signals, are the Fourier Transform (FT) method and the more advanced Iterated Amplitude Adjusted Fourier Transform (IAAFT) method [96].
For non-stationary signals, a method was proposed in [97] that randomised the phases of the CWT of the signal, while preserving the amplitudes of the coefficients, and then takes the ICWT. This largely maintains the time-frequency structure of the signal. As a result, any non-stationary structure is largely respected in the surrogate signals. However, when testing inter-frequency correlations, it is not obvious that one wishes to maintain any non-stationary structure within the power of each frequency band. If the non-stationarity varies significantly similarly between 2 separate frequency bands, that is interesting. It is the sort of relationship one is looking for, and so controlling for it in the surrogate data seems counter-intuitive.
In this work, the FT method was used to create surrogate s a signals. This is done by taking the Fourier transform of the signal, randomising the phases, and transforming the data back via the IFFT. This preserves the first order autocorrelation properties of the signal. However, unlike the IAAFT, it does not maintain the sample histogram. As such, phase-randomised surrogates often took negative values, even though original s a values are strictly non-negative. Fig. 11 shows examples of the original, IAAFT and FT phase-randomised time series, Fourier transform and sample histograms, and the phase-randomised ISCM for both methods. It can be observed that the Fourier transform and histogram are maintained by the IAAFT phase-randomisation. However, only the Fourier transform is maintained by the FT method.
Ultimately, the choice of the FT method over the IAAFT was due to the normality of the test statistics. The multiple testing method proposed in [46] requires normal distributions under the null, and the inter-scale distributions produced by the IAAFT method were often non-normal. As such, the sample histograms were sacrificed for the sake of normality. The final results of the statistical testing were compared for the two methods. The FT method found slightly fewer significant results, making it more conservative. The null distributions for both methods are provided in [48]. For the IAAFT method, the null distribution are only provided for channels 1-20, due to the computational expense in computing them. The code to produce them is available at [48].
If one wishes to maintain the non-stationary structure of the signal in the surrogate data for a specific application, that can also be done, e.g. [97], assuming normality under the null. This would be done by applying the CWT phase-randomisation method [97] to each s a of one's data. The computational expense is much greater, but is still reasonable for short signals.

Testing the Normality of the Phase-Randomisation Null Distributions
The vast majority of null distributions produced by FT phase-randomisation were normal. As such, they were compatible with the multiple testing procedure from [46]. The normality was confirmed visually for each analysed recording by using indicator maps of the statistically normal distributions. Fig. 12 shows the indicator map of normal distributions for a small number of randomly selected recordings, as an example.

Conservatively Removing the Effects of the Non-Dyadic Sampling
In this work, we chose to add the white noise mean ISCM to the distribution of ISCMs derived from phaserandomised versions of S. This is equivalent to deleting the white noise mean ISCM from the measured neural ISCMs. This is very conservative. E.g., if a neural ISC equals 1, and the mean value for white noise input is 0.8, the adjusted value will be 0.2. This value may not be significant in comparison to the phase-randomised null distribution, whose values can in theory range between -1 and 1, and is centered at 0. As such, even a measured perfect correlation may not be significant in our test. This is particularly the case for low-low scale correlations between significantly overlapping scales. This is because ISCM elements near the diagonal are most affected by the non-dyadic sampling (see Fig. 13 (ab)). Additionally, the variance of the distribution obtained from phase-randomised WPS is generally greater for low-low scale correlations (Fig. 13 (c-d)). As such, even perfect measured correlations between low-low scales, with significant overlap, may not be significant under the proposed test. As such, the proposed test is extremely conservative in that case. When ISCs between non-significantly overlapping scales are concerned, the test gives more reliable answers.
The author did not find any obvious unbiased way to eliminate the non-linear distorting effects of the non-dyadic sampling, the signature of which is shown in the white noise mean ISCM. As such, simply subtracting it provided a conservative solution to determining significant measured ISCs. It generally has no effect on ISCs sufficiently far from the diagonal ( Fig. 13 (a-b)), as scales are asymptotically uncorrelated with increased frequency-domain distance. As such, it is included to prevent false positives near the diagonal.

Spectral Interpolation Implementation
The original paper on Spectral Interpolation (SI) outlines the general method [82]. In this work, to remove the line noise, we transformed the signal to the Fourier domain using the FFT. We identified the peak line noise frequency at ∼60 Hz. We then spectrally interpolated it and all of its harmonics up to the Nyquist frequency. The interpolated segments had a width of 4 Hz. To perform the SI, we averaged the power in the 3 Hz band from either side of the interpolated segment to calculate the edge values of the interpolated segment. We then fitted a 1 st order polynomial to between the two edge values. We set the interpolated segments' frequency magnitudes as equal to the fitted line, and randomised the phases. This process is shown in Fig. 14.
We also spectrally interpolated across either one or two sharp components that were often present in the Fourier transforms of the recordings. These components varied in location between ∼2.4-2.8 kHz, and did not correspond to line noise harmonics. These interpolated segments were generally wider than 4 Hz. We also spectrally interpolated the frequency domain between 55 and 65 Hz, due to abnormally large components in this range. We performed no other SI or artifact removal. Black points indicate that, out of 10 normality tests, at least 9 failed to reject the null ISC distribution as non-normal. White points indicate that at least 2 tests found the null ISC distribution to be non-normal. The FDR was controlled at α=0.05 for each test individually using the procedure from [98]. The

Current Source Density Referencing Implementation
For a review of Current Source Density (CSD) referencing, see [99]. In a Utah array, CSD involves subtracting the average of the four electrodes around the referenced electrode from the recording. In this work, the procedure involved lowpass filtering the referencing signals at 500 Hz. For this we used the MATLAB function lowpass.We then averaged the lowpassed data, from the 4 electrodes that formed a square around the referenced electrode, to produce the reference signal. This reference signal was then subtracted from the referenced signal. Some of the electrodes existed on the edge/corner of the Utah array. In these cases, for references we used only the 3/2 electrodes that would have been used had it had 4 neighbours. Examples of the different geographies are shown in Fig. 15 (a). An example of CSD referencing of a neural signal is shown in Fig. 15 (b).
The neural medium may be non-capacitive [83]. As such, using only sub-500 Hz components in the reference signal was a somewhat arbitrary choice. However, it had the benefit of preventing high amplitude spikes from affecting the referencing.

Pre-whitening Implementation
For the final pre-processing step, the CSD-referenced data was pre-whitened. First, we removed samples x n−k , ... , x n so that k was minimised while ensuring that x n−k−1 was within 1% of x 1 . On average, this removed around 50 time steps from the end of the signal, and so had a negligible effect on the signal lengths. Clipping the end of x allowed the assumption of periodicity in the Fourier transform to hold. This removed the effects of cyclic discontinuities during pre-whitening. The danger of not accounting for cyclic discontinuities has been recently discussed in the literature [40]. Clipping was preferable to using a classical window, which would have distorted the normalisation of the ISC. This is because windowing produces smaller standard deviation at the edges. The effects of clipping, pre-whitening, and using a Hamming window prior to pre-whitening on the time and Fourier transform of a neural signal are shown in Fig. 16.
Post-clipping, during pre-whitening, the power in each frequency band was set to be uniform. This was done by transforming the data via the FFT, normalising the Fourier amplitudes while preserving the phases, and transforming the data back via the IFFT.

Pseudo-Random Number Generation for Monte Carlo White Noise Processes
A subtle point worth mentioning is that the MATLAB rand function uses Mersenne Twisters to produce pseudo-random numbers. The use of Mersenne Twister's in MC simulations has been criticised [100]. As such, the authors used the PCG family of RNGs [89] to produce pseudo-random numbers for the white noise processes. The state seed was set as 1 for all cases, and the sequence seed was set as n + v. We used a MC sequence length of either n = F s × 350 s = 8544922, n = F s × 400 s = 9765625, or n = F s × 500 s = 12207031 samples. The C code from [89] was adapted to create the 32-bit white noise processes and save them to '.txt' files. These were then accessed by the appropriate MATLAB functions. In MATLAB, the white noise processes were then processed as described in Section 4.4.1 and Fig. 10.