## Participants

Thirty-three university students were recruited to participate in the experiment (21 males and 12 females, mean age = 23.13 years and standard deviation = 5.16 years). The sample size was determined following conventional EEG studies [7]. The recruiting process stopped when the sample size reached 30. Three more participants registered for the study before the termination of the recruitment, thus their data were also collected. All participants had a normal or corrected-to-normal vision. All participants gave informed consent before participation. The study protocol was approved by the local IRB committee (National Taiwan University Research Ethics Committee, 201606ES031). All methods were carried out in accrodance with the relevant guideliines and regulations.

## Experimental Design

The behavioral task was modified from the color recall tasks for WM precision [33, 34]. The stimuli were presented on an LCD monitor on a black background at a viewing distance of 60 cm. The colors were selected from a circle in CIELAB space [35], centered at L = 70, a = 20, b = 38, in CIE coordinates with a radius of 60. The items to be remembered were a set of color squares with a size 2°x2° of visual angle. The color of each item was sampled from 45 possible color values at every 8° of the circle. The locations of the items were randomly chosen but equally spaced concentrically, 4.5° away from the center of the monitor. The color wheel for color recall was constructed with 360 color values, sampled from 1° to 360° of the circle in the color space. The color wheel was centered on the screen, with an inner and outer radius of 6° and 8.2° of visual angle.

Each trial began with a concurrent presentation of a beep sound (500 Hz) and a central fixation cross (gray, 1°\(\times\)1°). The beep sound lasted for 100 ms, and the fixation cross remained on display throughout the trial. After 500 ms, a set of color squares was displayed for 100 ms, followed by a 1000-ms delay period. After the delay period, a set of outlined squares was presented at the location of each item during stimulus presentation for 1 s (the probe period). One of the squares had thicker edges, indicating the target item for the observer to recall. A color wheel was displayed as the probe squares vanished. The observer had to identify the target color by mouse-clicking the corresponding color on the color wheel. During the response period, the color in the target location kept updating to the color selected by the mouse cursor (Fig. 1a). The subsequent trial began 500 ms after a target color was picked from the color wheel. In different experimental conditions, the set size of the sample array varied between 1, 2, 4, or 6 items. The four set-size conditions were randomly interleaved across trials. All participants performed 90 trials for each condition, making 360 trials in total.

## Mixture Model Analysis

Behavioral performance was assessed using the mixture model analysis [33]. In each trial, the degree of error was obtained by calculating the angular deviation between the coordinates of the reported color and the correct target color in the color wheel. The model proposed three sources of WM errors: 1) the probability that the target color was not stored in the WM system, 2) the probability of misreporting a non-target color instead of the target color, and 3) information was stored with noise in WM. Participants were assumed to make a uniformly random response when they could not recall the target color. In addition, all items were assumed to be encoded with equal resolution and encoding to decay at an equal rate. The probability of the reported color value in each trial can therefore be described as follows:

$$p\left(x\right)=\left(1-pU-pNT\right){\varPhi }_{\sigma }\left(x-\theta \right)+pU\frac{1}{2\pi }+pNT\frac{1}{m}\sum _{i=1}^{m}{\varPhi }_{\sigma }\left(x-{\theta }_{i}^{*}\right)$$

,

where \({\varPhi }_{\sigma }\) represented the von Mises (circular Gaussian) distribution with mean zero and standard deviation \({\sigma }\), \(x\) was the reported color value, \(\theta\) was the target color value, \({\theta }_{i}^{*}, i\in \{1,\dots ,m\}\) were the *m* non-target color values, \(pT\) was the probability of correctly remembering the target color, \(pU\) was the probability of random guessing, and \(pNT\) was the probability of misremembering the target location (Fig. 1b). The index for WM precision (\(\kappa =1/{{\sigma }}^{2}\)) was the concentration number of the von Mises distribution, which was the reciprocal of error variance.

## EEG Recording

EEG was continuously recorded throughout the experiment using SynAmps2 amplifiers (Compumedics Neuroscan) using Ag/AgCl electrodes mounted in a 64-channel electrode cap (Quik-Cap, Neuromedical Supplies). The EEG cap followed the international 10–20 system for electrode placement. Impedances of all electrodes were kept below 5 kΩ. The sampling rate was set at 2000 Hz. The horizontal and vertical electrooculograms were recorded along with the EEG recordings. The recordings were filtered with a low-pass filter of 500 Hz, and AFz served as the ground electrode.

## EEG Data Preprocessing

The raw EEG data were offline-referenced to the average of M1 and M2 electrodes. The data were epoched from − 1500 to 3000 ms relative to the onset of the sample array. Epochs containing artifacts of muscle activity, head or body movements, and electrode noise were rejected by visual inspection. Eye movements and blink artifacts were identified and removed with independent component analysis using the EEGLAB toolbox [36]. Epochs deviating more than 200 µV were also rejected. Electrodes containing more than 50% of such ‘bad’ trials were replaced by spline interpolation of their neighbors. 19 out of 33 participants required such interpolation (three channels for one participant, two channels for four participants, and one channel for 14 participants). Most removed electrodes were located in the forehead or temporal sites (FP1/2, FPz, T7/8, TP7/8, and C6). After this procedure, none of the participants was rejected for further analysis. The preprocessed EEG signal was further transformed into its scalp current sources via the surface Laplacian method [37]. The application of surface Laplacian highlighted the local features and reduced the volume-conduction effect.

## Holo-Hilbert Spectral Analysis (HHSA)

The preprocessed EEG data were decomposed with HHSA using in-house MATLAB codes. The codes for HHSA are available in the public repository (http://osf.io/243ps). The HHSA involves two layers of nested EMD and a 3D spectral representation for the signal [29]. EMD is an iterative process that decomposes data into a set of intrinsic mode functions (IMF, Huang et al., 1998). Each IMF is symmetric locally with zero mean by definition. Therefore, the instantaneous phase can be obtained directly with the Hilbert transform or later proposed methods such as normalized Hilbert transform or direct quadrature (DQ, Huang et al., 2009). Compared to conventional approaches, HHSA has the following advantages: 1) it is not limited to the uncertainty principle of time-frequency analysis (i.e., the trade-off between time and frequency resolutions) by providing instantaneous frequency [38, 39], so the data can be represented in high precision in the spectral domain; 2) the structure of data itself determines the timescales being examined, so *a priori* assumptions for frequency bands are not needed; 3) the envelope function of each frequency component can be decomposed again into a set of second-layer IMFs, where each second-layer IMF represents modulation of the envelope function within a dyadic timescale; 4) EMD is an additive decomposition method; therefore the original signal can be reconstructed by direct summation over IMFs.

In HHSA, the first-layer IMFs are obtained by applying EMD to the input data (Fig. 2b, the left column). In practice, EMD acts as a dyadic filter bank to the data so that each IMF represents oscillatory activity in distinct log-2 timescales [40, 41]. For instance, in an EEG signal with a 500-Hz sampling rate, the frequency range for the first IMF would be a Gaussian-like bandpass filter centering at around 128 Hz; the second IMF peaks at 64 Hz, and so on for consecutive IMFs (Fig. 4a). Since the envelopes of the first-layer IMFs are also broadband signals, they can be decomposed again with EMD (Fig. 2b, the right column). The second-layer IMFs describe the variation of envelope functions in distinct log-2 timescales; hence we obtain a representation for each envelope in the frequency domain. To distinguish the instantaneous frequencies of the first- and second-layer IMFs, we will refer to instantaneous frequencies of the first-layer IMFs as “carrier frequency” (CF) and to second-layer IMFs as “AM frequency” (AMF). The *Holo-Hilbert spectrum* (HHS) is the 3D representation of a signal over time (*t*), CF, and AMF (Fig. 2c). Extending the time-frequency representation with AMF shows the energy fluctuation induced by multiplicative interactions. Note that the AMF domain describes the results of multiplicative interaction from all possible sources. The PAC between any two signals can be evaluated via phase synchronization indices between the first- and second-layer IMFs (Fig. 2d). For instance, the theta-alpha PAC can be evaluated by calculating the directed phase-lag index (dPLI) [42] between the theta-band carrier and the frequency-matched AM component of the alpha-band envelope. The difference between AM and PAC under HHSA is illustrated in the supplementary files (Fig. S1).

In this study, HHS is projected along the three dimensions to highlight different aspects of the full spectrum. Projecting the HHS onto the *CF-time* plane created the *CF-t* spectrum (Fig. 3a), which captured the intra-mode frequency variation of the carriers. *CF-t* spectrum was equivalent to the spectrogram obtained from the Hilbert-Huang Transform (HHT) [38] that has been frequently used for analyzing the instantaneous frequency and amplitude variations of biological signals (e.g., [43, 44]). When HHS was projected onto the *AMF-time* plane, the result highlighted the dynamic of cross-scale interaction for a given range of carrier frequencies (the *AMF-t* spectrum, Fig. 3b). When HHS was projected onto the *AMF-CF* plane, the results showed the overall structure of cross-scale interactions between different frequency components in a given period (the *AMF-CF* spectrum, Fig. 3c). In this study, the resolution of HHS was set as 4 ms for time and 0.5 Hz for both CF and AMF.

Transient bursts in biological signals sometimes cause mode mixing, i.e., a single IMF contains multiple frequency components with classical EMD. Here, we applied an improved version of EMD utilizing sinusoidal masking signals in conjunction with the original EMD algorithm to reduce the problem [43, 45, 46]. A set of masking signals with initial phases 0, π/2, π, and 3π/2 were added into a pre-decomposed IMF obtained from regular EMD to enhance decomposition. The optimal amplitude of the masking signals is 1.6 times above the average amplitude of the components through experience [46]. We used a factor of 2 for both layers of EMD. The frequency of masking signals was the same as the mean instantaneous frequency of the pre-decomposed IMF. Instantaneous frequencies of IMFs were directly calculated by taking time derivatives of instantaneous phases obtained from DQ.

## Intra- and Inter-areal PAC under the HHSA Framework

IMF is a function that requires (1) symmetric locally to zero, and (2) there exists only one extremum between two successive zero-crossing points. According to the above definition, the instantaneous phases for the first- and second-layer IMFs can be obtained directly with DQ. Given the phase functions of carriers and their AMs, the degree of PAC can be measured by phase synchronization indices. Here, we used the dPLI to quantify the degree of PAC while taking the leading-lagging relationship into account:

$${\text{d}\text{P}\text{L}\text{I}}_{fg}= \frac{1}{N}\sum _{n=1}^{N}{H\left(Im\right(e}^{i\left({\phi }_{f}\left(n\right)-{\phi }_{g}\left(n\right)\right)}\left)\right),$$

where \({\phi }_{f}\) and \({\phi }_{g}\) denote the phase functions of two oscillations, and *N* denotes the number of sampling points within a time window. *H* is the Heaviside function (H(x) = 0 if x ≤ 0; H(x) = 1 if x > 0), and \(Im\) denotes that only the imaginary part of the relative phase is considered. The dPLI is a directed synchronization measure bounded by 0 and 1. If signal *f* is phase-leading to signal *g*, then \({0.5<\text{d}\text{P}\text{L}\text{I}}_{fg}\le 1\); if signal *g* is phase-leading to signal *f*, then \({0\le \text{d}\text{P}\text{L}\text{I}}_{fg}<0.5\). When PAC employs the dPLI formula under the HHSA framework, one phase function is from the carrier of a slower oscillation, and the other is from the frequency-matched AM component of a faster oscillation. This approach is similar to the Holo-Hilbert cross-frequency phase clustering (HHCFPC) that has been demonstrated as efficient in quantifying intra- and inter-areal CFC in the process of WM [17].

Combining HHSA and phase-based statistics to evaluate PAC has the following advantages over conventional methods: First, phase-based statistics, such as the phase-lag index, only consider the imaginary part of the spectrum. Therefore, they are robust to noise. Second, nonstationarity in the high-frequency power leads to instability in the Hilbert spectrum and is resolved via EMD. Third, different modulating waves may interfere when the modulation strength is determined by the distribution of high-frequency amplitude over low-frequency phase bins (e.g., [22]). In summary, the confounding of power is eliminated by utilizing phase-based statistics. The waveform shape is essentially the nonstationarity issue, which is resolved by EMD. The phase-reset factor is related to separating co-occurring phenomena from actual modulation.

## Statistical Analyses of EEG Data

We used MATLAB and the FieldTrip Analysis Toolbox for the statistical analysis of EEG data 47. Epochs were time-locked to the onset of the sample array for all analyses.

**Evaluating the Effect of WM Load on EEG Activity.** For preliminary analysis, we calculated the mean amplitude of each IMF across all electrodes for each set size and participant, then tested the effect of load manipulation with a repeated-measures ANOVA for set sizes 2, 4, and 6 at each time point. The single-item condition served as active control for the task. ANOVA results were corrected for multiple comparisons by a nonparametric cluster-based permutation test of 5000 iterations. The method only assumes that every experimental condition is associated with a probability distribution. Unlike parametric tests, the validity of the permutation test does not depend on the probability distribution of the data or the cluster-forming method [48]. Both the cluster-forming threshold and the critical value of significance were set at *p* = .05.

Based on previous studies [6, 7, 20], we choose mid-frontal electrodes (AF3/4, F1/2/3/4, FC1/2/3/4, Fz, and FCz) and parieto-occipital electrodes (P1/2/3/4/5/6/7/8, PO3/4/5/6/7/8, O1/2, Pz, POz, and Oz) for area-based analysis. Averaged *CF-t* power spectra in frontal and parieto-occipital electrodes were calculated separately, ranging from 1 to 50 Hz. Load effects for the *CF-t* spectra were evaluated with the same procedure of repeated-measures ANOVA as the preliminary analysis. If any significant cluster was identified, the *AMF-t* spectrum of the corresponding CF range was calculated and tested with the same ANOVA procedure.

**Correlation between Individual WM Precision and EEG Activity.** The same *CF-t* and *AMF-t* representations of parieto-occipital EEG power were also used for correlation analyses. EEG spectra and \(\kappa\) values from the set-size one condition were subtracted from those from set sizes 2, 4, and 6, and were then transformed into z-scores for regression analysis. For each spectrum, a general-linear model (GLM) was constructed in which the z-transformed \(\kappa\) values served as a regressor to predict the z-transformed EEG power in each spectral point. Since both \(\kappa\) values and EEG power varied systematically with increasing set size, the regression weights were separately evaluated for set sizes 2, 4, and 6, then averaged together for statistical tests (i.e., set size served as a control variable for the GLM). The approach was similar to previous within-subject studies [49, 50] but on a between-subject basis. The same cluster-based permutation procedure was applied to correct the results of GLM for load-effect analysis, but both cluster-forming threshold and critical value were set at *p* = .05 for two-tailed tests. In addition, we also tested EEG activity in frontal electrodes to see if there was any WM correlate in the cluster. Correlations between frontal *CF-t* spectrum and WM precision were analyzed first; if any significant correlate was identified, the *AMF-t* spectrum of the corresponding CF range was analyzed.

**Intra- and Inter-areal PAC.** The oscillations of interest were beta, alpha, and theta waves for this study. In this dataset, the frequency ranges of IMFs 4, 5, and 6 represented oscillatory activities in the beta-, alpha-, and theta-band, respectively. The intra- and inter-regional PAC were calculated as follows: first, we obtained the phase of theta waves (from IMF 6) and the phase of frequency-matched alpha/beta AM components (from IMFs 4 and 5) through HHSA. Second, the cross-frequency dPLI between theta phase and frequency-matched alpha/beta AMs across all possible electrode pairs was calculated and then averaged together according to the frontal and parieto-occipital electrodes of interest for each set-size condition. Time windows for dPLI evaluation were determined according to the previous results of load effects over HHS. The first interval ranged from 0 to 0.5 s after the onset of the sample array, and the remaining part of the delay period constituted the second interval. Time windows for the probe period were also divided into the first 0.5 s and the remainder interval for analysis. The effect of load manipulation was evaluated separately by a repeated-measures ANOVA for different time windows and electrode clusters. In addition, the phase synchronization of frontal and parieto-occipital regions was tested by the same procedure. The overall results were corrected for multiple testing using the Benjamini-Hochberg procedure [51] with a false-discovery rate (FDR) of 0.05.