Participants
Fourteen healthy, right‐handed volunteers (mean age 23 ± 3 years, mean ± standard deviation, seven females) participated in the study. All volunteers gave written informed consent to participate in this study, the procedures of which were approved by the International School for Advanced Studies (SISSA) ethics committee (protocol number 1899/II-16) in accordance with the Declaration of Helsinki.
Stimuli and Procedure
We used an auditory temporal reproduction task, in which subjects were asked to reproduce, by pressing, holding down and releasing a response key, the duration of a sound (a pure tone 1000 Hz in pitch) delivered via headphones. The beginning of a trial was indicated by a visual cue, an ‘X’ (2o of visual angle), presented on the screen placed at the posterior end of the MRI bore and lasting for 0.2s (Fig. 1A). After a brief post-cue period (1.3–2.3 s), a single pure tone was played via headphones for a variable duration (ranging from 0.32 to 1.1 s). After an interval ranging from 2 to 4 s, a burst of white noise presented for 0.1s instructed the subjects to reproduce the previously heard sound by pressing and holding down a response key. The subjects did not receive any feedback on their performance after their response. After the response, the next trial started following an inter-trial interval ranging from 0.3 to 0.5s. Occasionally, the subjects were randomly presented with catch trials in which only the pure tone was presented. In the catch trials the subjects did not reproduce the duration.
Subjects were tested separately in two temporal contexts. In the short context, the sound’s duration was either 0.32s, 0.46s or 0.65 s, in the long context it was 0.65 s, 0.85 s or 1.1 s. 0.65 s was presented in both contexts and it was the shortest duration in the long context and the longest duration in the short one. Every fMRI run consisted of 54 experimental trials and 9 catch trials, with 18 experimental trials and 3 catch trials for each duration; all trial types and durations were presented randomly. We collected 3 fMRI runs for each temporal context. The 3 runs of each context were always presented in sequence, whereas the presentation order of short and long context was counterbalanced across subjects. A total of 378 trials were collected for every subject, with 189 trials for each context and 63 trials for each of the 6 durations. The experimental paradigm was designed and presented using the Psychophysics toolbox [33] in Matlab (The Mathworks, Inc.).
Behavioral Data Analysis
For each participant and each duration of the two temporal contexts, we took as a measure of accuracy the reproduced duration, which was the time between response key press and response key release. To check for significant differences in the reproduced duration between the two contexts, the individual reproduced durations were entered in a repeated measures ANOVA with two contexts (short and long) and three durations (short, intermediate and long duration) as factors. As post-hoc tests we used paired t-tests in which the alpha level was set to 0.05.
MRI Acquisition
Blood oxygenation level-dependent (BOLD) functional imaging was performed using an actively shielded, head-only 7T MRI scanner (Siemens, Germany), equipped with a head gradient-insert (AC84, 80 mT/m max gradient strength; 350 mT/m/s slew rate) and 32-channel receive coil with a tight transmit sleeve (Nova Medical, Massachusetts, USA). The ultra-high magnetic field system allowed us to have voxels with smaller size compared to lower field MRI thus increasing the spatial resolution of the functional data. Moreover, in 7T systems the signal strength of venous blood is reduced due to a shortened relaxation time, restricting activation signals to cortical grey matter which results in a better signal-to-noise ratio[34]. Time-course series of volumes were acquired for each run using the multiband sequence. The spatial resolution was 1.5 mm isotropic, the volume acquisition time (TR) was 1368 ms, the flip angle was 60 degrees, the echo time (TE) 23 ms and the bandwidth 1903 Hz/Px. The matrix size was 146 x 146 x 75, resulting in a field of view of 219 (AP) x 219 (RL) x 112.5 (FH) mm. An undersampling factor 0 and CAIPIRINHA shift 3 were used. Slices were oriented transversally with the phase-encoding direction anterior-posterior. 146x42x75 reference lines were acquired for the GRAPPA reconstruction.
High-resolution whole-brain MR images were also obtained using the MP2RAGE pulse sequence optimized for 7T (voxel size = 0.60 x 0.60 x 0.59 mm, matrix size 320 x 320 x 256, TI1/TI2 =750/2350ms, α1/α2 = 4/5 degrees, TRMP2RAGE/TR/TE = 5500/6000/4.94 ms).
fMRI Preprocessing
Functional imaging data were preprocessed using the Statistical Parametric Mapping (SPM12 v. 7219, Wellcome Department of Imaging Neuroscience, University College London) toolbox in MATLAB. In each individual subject the EPI volumes acquired in the different runs were first realigned. The runs were first realigned to each other, by aligning the first scan from each session to the first scan of the first session. Then the images within each session were aligned to the first image of the session. The realigned images were then co-registered to the T1-weighted image acquired in the same session. The subject’s images in native space realigned and co-registered to the T1-weighted image were next smoothed with a 2 mm full-width at half-maximum Gaussian kernel.
GLM analysis
The fMRI time series were analyzed at individual subject level using a univariate GLM approach. The events of interest in the GLM analysis included the offsets of the three durations in the two contexts and the onset of the response (i.e., onset of the keypress). We also modelled the visual cue onset signaling the beginning of each trial and the six motion correction parameters as effects of no interest. The duration of all events was set to zero. The individual GLM included the six fMRI runs, three for each context, and each run had 13 regressors (7 of interest and 6 of no interest). All events were convolved with the canonical hemodynamic response function (HRF). The fMRI time series data were high-pass filtered (cutoff frequency = 0.0083 Hz). Correction for non-sphericity [35] was used to account for possible differences in error variance across conditions and any non-independent error terms for the repeated measures. To identify the brain areas exclusively active at the offset of the encoded sound (and not during reproduction) independently from the different durations and the two contexts we contrasted duration and response (duration offset - response onset, resulting in one t-contrast for each subject) and we averaged across durations and contexts. To identify the presence of auditory chronomaps in each temporal context i.e., voxels exclusively active at the offset of the sound, but not during the reproduction, and maximally activated by each specific duration we used as contrast of interest sound offset - response onset for each sound and context (resulting in 6 t-contrasts for subject). In all t-contrasts, pFWE < 0.05 corrected for multiple comparisons across the whole brain.
Winner-take-all. To appreciate the existence of chronomaps in each temporal context, the three t-maps, obtained at single subject level and for each context were then used to classify the voxels according to their preference to one of the 6 different durations (three durations in the two contexts). Voxels were classified according to a “winner take all” rule (WTA), for example voxels with the greatest t value (threshold was set to t> 3.13), for the shortest duration range in the short context (0.32 s) were classified as responsive to that duration range and labeled with number 1. We created 6 different labels corresponding to each duration in the two contexts i.e., 0.32s, 0.46s, 0.65 s (SC), 0.65 s (LC), 0.85 s and 1.1 s. For WTA we used only the clusters of voxels that were significant at p<0.05 cluster-level corrected for multiple comparisons across the whole brain.
Second-level GLM analysis. To make sure that the activations observed at individual level also emerged as a statistically robust group level result we also ran a GLM 2-level analysis. For each subject, we computed a GLM analysis on MNI normalized images. The GLM analysis was identical to that described earlier. The MNI normalization was computed using the SPM 12 pipeline. The Realigned and normalized images were smoothed with a 4 mm full-width at half-maximum Gaussian kernel.
For each subject we estimated 12 t-contrasts, one for each duration and each response in the two temporal contexts, averaging across the 3 runs of each context. The resulted contrast images (t-statistics maps) were then used in a second-level ANOVA where we contrasted all durations against all responses (6 durations – 6 responses). The statistical threshold was set to p < 0.05 FWE cluster-level corrected for multiple comparisons across the entire brain volume (cluster size estimated at a voxel level threshold p-uncorrected = 0.001). Correction for non-sphericity (Friston et al., 2002) was used to account for possible differences in error variance across conditions and any non-independent error terms for the repeated-measures.
Anatomical image processing
The high-resolution MP2RAGE images were analyzed using Freesurfer software [36] (http://surfer.nmr.mgh.harvard.edu/). Freesurfer’s automatic pipeline performs the volumetric segmentation of the MRI data, the surface reconstruction of inflated surfaces, the flattening of cortical regions of interests, the cortical parcellation, and the neuroanatomical labelling with the Freesurfer/Destrieux atlas [37].
Morphing using Freesurfer. Visualizations and computations requiring moving surface data from different subjects into a common surface were performed using the Freesurfer operation “mri_surf2surf”. The source surface was the surface closest to the mean surface estimated across subjects. For example, when morphing data of the SMA Freesurfer label from different subjects into one destination subject space, the SMA label area was first estimated for all subjects. The destination subject space was the subject with the SMA label closest to the mean of the SMA labels across subjects. This method of morphing ensures the best transformation of data from multiple sources to a single destination space.
Surface-based quantification of chronomaps spatial progressions
Chronomaps were visualized and their metrics estimated on inflated and flattened cortical surfaces.
The areas where we explored the existence of chronomaps, which were significantly active at the offset of all sounds and contexts in all subjects (see S-Fig1) were called Region of Interest (ROI). These were SMA, IPS and parabelt areas.
Chronomaps were identified in the left and the right hemisphere of each individual subject using the SPM t-maps resulting from the WTA method. These volumetric maps were projected onto the cortical surface of each individual brain following the Freesurfer pipeline (with a projection fraction set to 0.5). Individual chronomaps for short and long context separately were identified in SMA, IPS and parabelt areas of both hemispheres. Maps in all subjects, ROIs and hemispheres were visually identified when there was a clear spatial progression of duration preferences from short to long durations. Maps’ borders were manually drawn at the edge of the clusters of vertices (vertices are voxels projected into the cortical surface) which preferred the longest and the shortest duration of the range.
The spatial progression was quantified on flattened surfaces as the normalized distance (nD) of each duration selective vertex from the shortest edge for the map. The normalized distance was defined as:

The distance was computed for each vertex in the cluster and then averaged across vertices of the same cluster. The average vertex distance was then estimated for each duration selective cluster of vertices in the two contexts in all subjects and tested for statistical significance using a t-test. In each individual subject, a slope of the spatial progression of duration selective vertices was also computed.
To quantify the spatial shift of the duration selective clusters active for the duration shared between the contexts, for each individual subject and for each context separately we computed the distance of the 0.65 s cluster (averaging the distances of all vertices within the cluster) from the shortest edge of the map and we then compare it across subjects using a t-test. To check for the presence of a correlation between spatial shifts in the cortical representation of 0.65 s and reproduction of this duration across contexts, we plotted the difference in the reproduced duration of 0.65 s in the two contexts (SC - LC) against the absolute difference in distance of the 0.65 s duration selective clusters in the two contexts.
SMA chronomap. In SMA, whose location was double checked with the Freesurfer “BA6” label of both the hemispheres, we defined chronomap’s orientation as the spatial progression of clusters of vertices showing duration preferences. A flat surface of the “BA6” label was created for each hemisphere and subject.
IPS chronomap. Chronomaps in the IPS were identified using the Freesurfer’s IPS label (“S_intrapariet_and_P_trans” label) for each subject. For each subject and hemisphere, a flat surface representation of the IPS label was then created to compute the map’s attributes. To have a more data-driven approach in determining the chronomap progressions and orientations we developed an octagonal search method to determine the best map orientation in the IPS. An octagonal search grid was assumed over the IPS area. The eight sides of the octagon then served as the borders or edges for possible test orientations, resulting in four pairs of shortest-to-longest borders. The primary orientation was assumed to be orthogonal to the postcentral gyrus (poCG). The orientations were defined as relative to this primary orientation axis, with the remaining three axes of the octagon at 45o, 90o and 135o. For each context, the average vertex normalized distances (nD) were computed across the durations and octagonal test orientations. For every test orientation, a slope was computed from the nDs of the different durations in the two contexts. The slope reflected how well the duration clusters were topographically organized in that orientation. The winning map orientation for a given subject and hemisphere was the orientation resulting in the steepest slope. This method of using a common anatomical reference makes the resulting map orientations comparable across the subjects.
Parabelt chronomap. The auditory parabelt was defined as the ROI including the following Freesurfer labels: “G_temp_sup-Lateral”, “S_temporal_transverse”, “Lat_Fis-post”, and “G_temp_sup-Plan_tempo”. A similar octagonal search grid method used with the IPS chronomaps was applied to the auditory parabelt maps for each hemisphere and subject. The primary orientation here was assumed to be orthogonal to Heschl's gyrus (HG). For every test orientation, a slope was computed from the average nDs of the different durations in the two contexts. The winning orientation for a given subject and hemisphere was the orientation resulting in the steepest slope.
Overlap between the two context maps. To visualize the extent of segregation and overlap between the short and long contexts, the two context maps in each ROI (i.e., SMA, IPS and parabelt) were visualized together on individual cortical surfaces. When the two context maps were visualized at group level, for each subject, only the hemispheres with the winning orientation were overlaid on the common surface space. The dominant orientation in SMA was the anterior-to-posterior (N left hemispheres = 19, N right hemispheres = 12), in IPS it was the orientation orthogonal to the poCG (N left hemispheres = 8, N right hemispheres = 6) and in the parabelt areas it was the orientation parallel to the HG (N left hemispheres = 4, N right hemispheres = 3).
Moreover, to quantify the amount of overlap between the maps in the two contexts, we computed the difference of the shortest and longest edges of the maps in two contexts i.e., SC-LC. This difference was computed for each ROI in each individual flat surface using only the maps where both short and long contexts had the same dominant orientation. We first estimated the distance of each map border (i.e., shortest and longest edge) from an anatomical landmark, which was pCG, poCG, HG for SMA, IPS and parabelt, respectively. We then subtracted this distance value of the long context from the same distance value of the short context (SC-LC).
Duration tuning analysis
We checked the response properties of duration selective clusters of voxels by also looking at the BOLD response in those clusters to preferred and non-preferred durations within and across contexts. In each subject, ROI and context to avoid circularity, the duration selective clusters of voxels were identified in one run and the hemodynamic response of those clusters was extracted from the remaining two runs (in all possible combinations). The duration selective clusters from a single run were identified using the GLM analysis and WTA approach, as described earlier (see GLM and winner-take-all analysis). For each cluster of duration-selective voxels the normalized hemodynamic response was estimated as:

Where, x(t) is the signal in each voxel and MB is the baseline that was obtained averaging the signal of t for each run. Normalization was performed by subtracting the signal in each voxel from a baseline value and dividing it by the baseline. The BOLD response was aligned to the second volume (i.e., a TR) after the duration offset (see also 6). Within a single subject, we first averaged the BOLD signal across the voxels of a cluster and then across the fMRI runs.
Multivariate Pattern Recognition Analysis (MVPA)
The multivariate pattern analysis (MVPA) was performed using the CosmoMVPA toolbox[38] in MATLAB (Matlab Inc.). For the MVPA analysis the fMRI time series were reanalyzed as before (see GLM analysis) but in the fMRI preprocessing the realigned and co-registered images were unsmoothed. The GLM analysis, as specified above, included the six runs from the two contexts. Each run included 7 events of interest and 6 events of no interest. The modelled events included the visual cue onset, the three sounds offset, the three response onsets and the six motion correction parameters. The beta values associated with the offset of the six durations (three for each context) were then used for the MVPA analysis.
Predicting the different durations in the two contexts. In the first place we used MVPA to check whether from the activity of SMA, IPS and parabelt areas we could predict the three different durations in the two contexts. To this purpose we used a leave-one-run-out cross-validation approach. For each subject and in each ROI, a support vector machine (SVM) classifier (LIBSVM [39]) was trained to classify the pattern of activity associated with the six durations from two runs (one for each context). The classifier was then tested using the activity pattern from the left-out run. This classification routine was iteratively performed until every run was left out once (3 iterations). The overall classification accuracy was then computed by averaging the classification accuracy from all iterations. The classification accuracies resulting from this analysis were visualized as confusion matrices (chance level is ⅙=0.17). To test if the cross-validation results were specific to the ROIs associated with the timing task, we performed the same analysis on two additional task-unrelated ROIs. The two control ROIs were the occipital pole (OP) and the orbitofrontal cortex (OC). OC was defined using the “G_orbital” and “S_orbital-H_Shaped Freesurfer” labels. While the OP was defined using the Freesurfer “Pole_occipital” label. To compare the prediction accuracy of the different ROIs we used 6 durations by 6 ROIs, repeated measures ANOVA and we used paired t-tests as post-hoc tests. Alpha level was set to p=0.05.
Predicting the contexts. With MVPA we also tested if the pattern of activity in SMA, IPS and parabelt and the two control ROIs could predict the two contexts independently from the different durations. As before, we used a cross-validation approach. Here the SVM classifier was trained to classify the pattern of activity associated with the two contexts from two runs and then tested using the activity pattern from the left out run. A classification accuracy at chance level was equal to ½=0.5.
Double-checking the duration preferences. We performed a complementary MVPA analysis to establish if the duration selective clusters identified with the winner-take-all approach were indeed selective to the assigned duration [40]. This analysis was performed using a cross-validation approach as described before. Here, instead of using the whole ROIs, the cross-validation was carried out separately for each duration selective cluster of SMA, IPS and parabelt areas. To check the decoding accuracy of the different duration selective clusters of voxels, this searchlight analysis was conducted with a search area of 1 voxel that moved across the whole duration selective cluster. For each of the voxels, the cross-validation analysis was carried out using a leave-one-out approach. An SVM was trained to classify the pattern of activity associated with the six durations from two runs and then tested using the response pattern from the left out run. For all subjects and all ROIs, we estimated the classification accuracy of each voxel in all duration selective clusters.
Representational similarity analysis
To measure how the brain activity differed between the six different durations in the two contexts, we analyzed the fMRI time series using a multivariate representational similarity analysis (RSA) [41]. With RSA for each ROI i.e., SMA, IPS, parabelt, we correlated the beta values associated with the offset of each duration with the beta values associated with every other offset duration within and across contexts. To perform this correlation, in each individual subject we averaged the betas corresponding to each duration offset across the three runs of the same context. The correlation was measured with the Spearman correlation coefficient. The resulting correlation coefficients were entered into a representation dissimilarity matrix (RDM) where each entry was created by subtracting the correlation coefficient by 1 and averaging this correlation coefficient across subjects. This value reflects how dissimilar on average each duration representation is from the others.