## Animals

This study was performed in strict accordance with the “Guiding Principles for the Care and Use of Animals in the Field of Physiological Science” published by the Japanese Physiological Society, and the recommendations in the ARRIVE guidelines (https://arriveguidelines.org/). The experimental protocol was approved by the Committee on the Ethics of Animal Experiments at the Research Center for Advanced Science and Technology, University of Tokyo (Permit Number: RAC 130107). All surgeries were performed under isoflurane anesthesia. All efforts were made to minimize animal suffering. Following the experiments, animals were euthanized with an overdose of pentobarbital sodium (160 mg/kg, i. p.).

Four male Wistar rats were used in this study (11–13 weeks old; body weight, 290–330 g). The protocols for animal preparation and neural recordings have been described elsewhere [20, 31, 37]. Briefly, the rats were anesthetized with isoflurane and air at a concentration of 3% for induction and 1% for maintenance during the surgery and experiments. Animals were held in place with a custom-made head-holding device. Atropine sulfate (0.1 mg/kg) was administered pre- and post-surgery to reduce the viscosity of bronchial secretions. A skin incision was made at the start of surgery under local anesthesia using lidocaine (0.3–0.5 mL). A needle electrode was subcutaneously inserted into the right forepaw and used as the ground. A small craniotomy was performed close to bregma in order to embed a 0.5 mm-thick integrated circuit socket as a reference electrode, with electrical contact to the dura mater. The right temporal muscle, cranium, and dura overlying the auditory cortex were surgically removed. The exposed cortical surface was perfused with saline to prevent desiccation. Cisternal cerebrospinal fluid drainage was performed to minimize cerebral edema. The right eardrum, ipsilateral to the exposed cortex, was ruptured and waxed to ensure unilateral sound inputs from the ear contralateral to the exposed cortex. Respiratory rate, heart rate, and hind-paw withdrawal reflexes were monitored throughout the surgery to ensure maintenance of stable and sufficient anesthesia. For acoustic stimulation, a speaker (Technics EAS-10TH800, Matsushita Electric Industrial Co. Ltd., Japan) was positioned 10 cm from the left ear (contralateral to the exposed cortex). Test stimuli were calibrated at the pinna with a 0.25-inch microphone (4939, Brüel & Kjær, Denmark) and spectrum analyzer (CF-5210, Ono Sokki Co., Ltd., Japan). Stimulus levels were presented in dB SPL (sound pressure level in decibels with respect to 20 µPa).

## Electrophysiology

We used a surface microelectrode array and depth electrode array (NeuroNexus, Ann Arbor, MI, USA) to simultaneously measure neural activity in the auditory cortex and thalamus, as previously described [28] (Fig. 1a). The surface microelectrode array comprising a 10 × 7 grid within 4 × 3 mm2 mapped local field potentials (LFPs) in the right temporal cortex to identify the location of the primary auditory cortex (A1) [37]. The depth microelectrode array was then inserted perpendicular to the cortical surface in A1. The array comprised three shanks (6 mm in length), each of which constituted 15 distal recording sites for MGv and 17 proximal sites for A1. The array simultaneously measured multi-unit activity (MUA) and LFPs from the MGv and A1. The diameter of recording sites was 30 µm. The center-to-center inter-electrode distance was 120 µm. The most distal site was placed 100 µm from the tip of the shank, and the distance between the most proximal site in the MGv and the most distal site in A1 was 1,200 µm. Each electrode was composed of iridium oxide and coated with platinum black.

Neural signals were amplified with a gain of 1,000 (Cerebus Data Acquisition System; Cyberkinetics Inc. Salt Lake City, UT, USA) software. The digital filter bandpass was 0.3–500 Hz for LFP and 250–7500 Hz for MUA. The sampling rates for LFPs and MUA were 1,000 Hz and 30 kHz, respectively. Multi-unit spikes were detected online from MUA by threshold-crossing (–5.65 times root mean square of MUA).

Spontaneous activity was first characterized as MUA in a silent environment for 5 min. Auditory-evoked activity was then characterized in response to clicks and tone bursts. Clicks were presented at a rate of 1 Hz. Tone bursts were used to characterize the characteristic frequency (CF) at each recording site. CF was determined as the frequency at which test tones evoked MUA with the lowest intensity or the largest response at 20 dB SPL (the minimum intensity used in this study). Test frequencies ranged from 1.6 to 6.4 kHz with an increment of 1/3 octaves and intensities from 20 to 80 dB SPL with an increment of 10 dB. Each test tone was repeated 20 times in a pseudorandom order with an inter-tone interval of 600 ms. Recording sites at which CF was identified were defined as either MGv or A1, whereas those at which CF was not identified were excluded from further analyses.

For the grand average of 240-trial click-evoked LFPs from the depth array, one-dimensional current source density (CSD) analysis (Fig. 1b) was conducted, as described previously [28, 38, 39]. Briefly, twice the potential at a given depth (V0) was subtracted from the sum of the potentials at the upper and lower adjacent sites of a given depth (Vu and Vl), and then divided by the square of the distance (Δ*x*) between the recording sites (120 µm):

(Vu + Vl − 2V0) / Δ*x*2.

Each layer was defined based on the CSD results as follows: L4 was first defined as the site with the earliest sink and adjacent sites as sinks and no source. L2/3 was defined as sites above L4 with sinks, followed by short sources. L5 was defined as two successive sites with sources below L4. Weak sinks were identified in deeper sites, of which the second deeper site was defined as L6.

## Transfer entropy

TEs of either thalamo-cortical, cortico-cortical, or cortico-thalamic projections were derived from MUA data of either spontaneous activity or click-evoked activity in a pairwise manner. TE was estimated from MUA data binarized with a bin of 1 ms (Fig. 2a). Bins with spikes were labeled as 1; those without spikes were labeled as 0. None of the bins contained two or more spikes. The TE of Y to X or \({TE}_{Y\to X}\) was defined as follows:

$${TE}_{Y\to X}=H\left({X}_{future}|{X}_{past}\right)-H\left({X}_{future}|{X}_{past},{Y}_{past}\right)$$

1

where *H*(A|B) represents the conditional entropy in information theory, which indicates the unpredictability of A when information on B is known. \({TE}_{Y\to X}\) estimates how spikes at electrode Y (\({Y}_{past}\)) improve the prediction of spikes at electrode X (\({X}_{future}\)), beyond the prediction based on past data of X (\({X}_{past}\)). Here, \({TE}_{Y\to X}\) was calculated as follows:

$${TE}_{Y\to X}(t, lag)=\sum _{\begin{array}{c}{X}_{t+lag}\\ {X}_{\begin{array}{c}t+lag-d\\ {Y}_{t}\end{array}}\end{array}}p\left({X}_{t+lag},{X}_{t+lag-d},{Y}_{t}\right) {log}_{2}\frac{p\left({{X}_{t}|X}_{t+lag-d},{ Y}_{t}\right)}{p\left({X}_{t}|{X}_{t+lag-d}\right)}$$

2

where t, lag, and *d* represent the time, transfer lag, and delay, respectively, between the future and past. \({Y}_{t}\) represents the past state of electrode Y (\({Y}_{past}\)). \({X}_{t+lag}\) and \({X}_{t+lag-d}\) represent the future and past states, respectively, of electrode X (\({X}_{future}\) and \({X}_{past}\)). The past data of X were obtained from *d* bins before a given time point of (*t* + *lag*), which were optimized as follows, assuming that X*t* depends predominantly on past X*t−d*:

$$d=argmin H\left({{X}_{t}|X}_{t-d}\right)$$

3

According to Eq. (1), we quantified \({TE}_{Y\to X}\) for given electrode pairs with either a short window (15 ms) or long window (10 s) (Fig. 2b).

## (i) Long-window TE with and without stimuli (long-window TEstim and long-window TEspon, respectively)

Long-window TE was derived using 10-s windows to assess if information transmission differed depending on the state of the thalamo-cortical system (i.e., during sensory processing vs. resting state). Long-window TEstim was derived from MUA over a continuous period of 240 s, during which clicks were presented every second. Long-window TEspon was derived from a separate 240-s time period of data during which no stimulus was delivered. Ten sets of 10-s \(p\left({X}_{t+lag},{X}_{t+lag-d},{Y}_{t}\right)\) and spike trains were randomly selected to derive the joint probability, \(p\left({X}_{t+lag},{X}_{t+lag-d},{Y}_{t}\right)\). Based on Eq. (1), 10 sets of TE were then estimated in the transfer lag ranging between 1 and 30 ms. Long-window TEs were ultimately defined as the median across 10 sets for each *lag*.

## (ii) Short-window TE

Short-window TE was computed using 15-ms windows to characterize information transmission in the thalamo-cortical system during the time window surrounding stimulus onset. The time course of information transmission for short-window TE was investigated using moving window analysis.

For trial *i* (= 1, …, 240), in response to a click delivered at time *s**i*, spike trains within 15-ms post-stimulus latency were used to derive short-window TE. Based on short-window TE at stimulus onset (short-window TEonset), we first identified significant information transmission and the optimal *lag* of TE for a given electrode pair. For \(\left[{s}_{i}+1, {s}_{i}+15\right] := \left\{{s}_{i}+1\le t\le {s}_{i}+15\right\}\), the joint probability, \(p\left({X}_{t+lag},{X}_{t+lag-d},{Y}_{t}\right),\) was obtained to derive \({TE}_{Y\to X}\) at a given *lag* according to Eq. (2). Short-window TEonset was ultimately defined as the median across 240 trials for each *lag*.

We next characterized the time-course of short-window TE, i.e., how TE evolved over time in the thalamo-cortical system during the time window surrounding stimulus onset. We computed the short-window TE for \(\left[T-\frac{15+lag}{2}, T+\frac{15+lag}{2}\right] := \left\{T-\frac{15+lag}{2}\le t\le T+\frac{15+lag}{2}\right\}\), where *T* ranged from *s**i* − -10 to *s**i* + 40 and the *lag* was the optimal value in the short-window TEonset. When *t* was not an integer, *t* was rounded off to the nearest integer. The time course of short-window TE was ultimately defined as the median across 240 trials for each *T*. The earliest *T* when TE > 0 after bias correction (see the next section) was defined as the onset latency of information transmission.

## Statistical analyses for identification of significant information transfer

To identify electrode pairs with significant information transfer, we compared the above TEs derived from experimental data with those derived from shuffled data (TEshuffled). To generate the shuffled data, we randomly shuffled the inter-spike intervals (ISIs) of Xt and Yt without changing the ISI distribution. Shuffling disrupted the temporal structure underlying functional connectivity between Xt and Yt.

To assess statistical significance of information transfer, we estimated p-values as the rank order of empirically identified TE values among the null distributions arising from 100 TEshuffled. For example, if the empirical TE was larger than the top 5% of 100 sets of TEshuffled, we regarded the p-value to be less than 0.05 [38]. We corrected for multiple comparisons across transfer lags (1–30 ms) using the false discovery rate (FDR) method [39]. Further, we defined a pair of functionally connected electrodes as those with significant information transfer within a time window of 5 ms or more (Fig. 3).

When quantifying the amount of information transfer, we considered the degree of positive bias caused by a limited amount of sample data. Theoretically, TEshuffled must become 0 because shuffling should disrupt any causality between X and Y. However, the actual TEshuffled was larger than 0 due to biases, which were removed by subtracting the median TEshuffled from the TE. When TE was smaller than TEshuffled, no information transfer was assumed (i.e., TE = 0).

## Normalized TE (nTE)

Mean firing rates of evoked activity was substantially higher than those of spontaneous activity (Fig. 1c and 1d). To eliminate the bias due to differences in mean firing rate, we introduced the nTE. This normalization was necessary when comparing TEs derived from evoked and spontaneous states with different probability densities as follows:

$${nTE}_{Y\to X}=\frac{H\left({X}_{future}|{X}_{past}\right)-H\left({X}_{future}|{X}_{past},{Y}_{past}\right)}{H\left({X}_{future}|{X}_{past}\right)}= \frac{{TE}_{Y\to X}}{H\left({X}_{t+lag}|{X}_{t+lag-d}\right)}$$

Practically, the bias of nTE was corrected as

$$n {TE}_{Y\to X}= \frac{{TE}_{Y\to X}-{TE}_{Y\to X}^{shuffled}}{H\left({X}_{t+lag}|{X}_{t+lag-d}\right)} \in \left[\text{0,1}\right]$$

4

## Information transmission in a given pathway

We characterized the information transmission in each pathway as the average of the peaks of nTE among pairs with significant information transfer:

$$Average of nTE peaks = \frac{1}{n}\sum \left(peak of nTE\right)\times \frac{n}{{N}_{pathway}}\dots \left(5\right)$$

where *n* is the number of pairs with significant information transfer, and \({N}_{pathway}\) is the number of possible pairs of electrodes.

## Role of a given region in information transmission

Based on the average of the nTE peaks defined above, we quantified whether each region (X) served as either a receiver (\({R}_{X}\)) or a sender (\({S}_{X}\)). The metrics \({S}_{X}\) and \({R}_{X}\) were defined as the summation of the average of the nTE peaks as follows:

\({S}_{X}={\sum }_{i}average of nTE peaks\left(X\to {region}_{i}\right)\dots \left(6\right)\)

\({R}_{X}={\sum }_{i}average of nTE peaks\left({region}_{i}\to X\right)\dots \left(7\right)\)

\({region}_{i}\) : one of (MGv, L2/3, L4, L5, and L6) with the exception of \(X\)

where *the average of nTE peaks(pathway)* is the average of nTE peaks in a given pathway, as defined in equation (5). We then characterized each region *X* using the *SR ratio*:

$$SR ratio=\frac{{S}_{X}-{R}_{X}}{{S}_{X}+{R}_{X}} \in [-1, 1]\dots \left(8\right)$$

A positive *SR ratio* indicated that region *X* served as a sender, whereas a negative *SR ratio* indicated that region *X* served as a receiver.