**Animals**

All animal procedures were approved by the Regierungspräsidium Freiburg, Germany. In this study we used six male Long Evans rats (400 g, Janvier) which were implanted at the age of eight weeks and recorded up to four months after the implantation. Three to four animals were pair-housed in type 4 cages (1500U, IVC typ4, Tecniplast, Hohenpeißenberg, Germany) before implantation and the animals were single housed after the implantation in type 3 cages (1291H, IVC typ4, Tecniplast, Hohenpeißenberg, Germany) under a 12 h light dark cycle (dark period from 8 a.m. to 8 p.m., time span of training and experiments). Prior to the first behavioral training, no behavioral tests were conducted, no drugs were applied and food (standard lab chow) and water were provided *ad libitum*. During the course of the experiment, the animals were maintained with free access to food but water supply was restricted. Rats were kept at > 80 % body weight as measured prior to water restriction. For 2 days per week, free access to water was ensured.

**Animal surgery**

Animals were initially anesthetized with isoflurane inhalation followed by intra-peritoneal injection of 75 mg/kg Ketamine (Medistar, Holzwickede, Germany) and 50 mg/kg Medetomidin (Orion Pharma, Espoo, Finland). The animals were then put into a transportation container covered with an opaque cloth to facilitate the anesthesia. Once the animals were anesthetized, they were positioned in a stereotaxic frame (David Kopf Instruments, Tujunga, CA, USA) and their body temperature was kept at 36 °C using a rectal thermometer and a heated blanket (FHC, Bowdoin, USA). The anesthesia of the animals was maintained with approximately 2% isoflurane and 0.5 l/min O2. For pre-surgery analgesia, we subcutaneously (s.c.) administered 0.05 mg/kg Buprenorphine (Selectavet Dr. Otto Fischer GmbH, Weyarn/Holzolling, Germany). Every other hour, the animals received a s.c. injection of 5 mL isotonic saline. Moisturizing ointment was applied to the eyes to prevent them from drying out (Bepanthen, Bayer HealthCare, Leverkusen, Germany). The skin was disinfected with Braunol (B. Braun Melsungen AG, Melsungen, Germany) and Kodan (Schülke, Norderstedt, Germany). To perform the craniotomy, the skin on the head was opened along a 2 cm long incision using a scalpel. The exposed bone was cleaned using a 3% peroxide solution. Self-tapping skull screws (J.I. Morris Company, Southbridge, MA, USA) for reference for extracellular recordings were placed over cerebellum. Craniotomies were drilled bilaterally extending from -2 to +5 mm in the anterior posterior direction and from +1 to +4 mm in the lateral medial direction relative to Bregma. 22 tungsten electrodes (200 to 600 kOhm impedance, polyimide insulation, WHS Sondermetalle, Grünsfeld, Germany) were implanted at a depth of 1.2 mm in each hemisphere. Electrodes were implanted according to the area borders given by the online brain atlas from Matt Gaidica 40 and CFA and RFA was delineated according to Neafsay and Sievert 41, and Rouiller et al 42 (**Fig. 1E**). Three rows of 6 electrodes each, oriented in the medial-lateral direction, were implanted in the anterior-posterior direction. The fourth and last row consisted of 4 electrodes, oriented in the medial-lateral direction (see **Fig. 1E**). Occasionally, we had to cut some electrode wires, in order to not destroy blood vessels at the implantation site (e.g., rat 221, left hemisphere, last electrode row). Kwik-Cast (WPI, Sarasota, FL, USA) was used to protect the brain from the dental cement applied in the final step. Before, Mill-Max connectors (Mill-Max, Oyster Bay, USA) from each hemisphere were glued together to form a 4 x 13 pin connection matrix. The last and first four pins were connected to the two skull screws over cerebellum to serve as reference and ground. Finally, the assembly was fixed using dental cement (Paladur, Kulzer GmbH, Hanau, Germany).

**Behavioral tasks**

Animals were encouraged to move with as little repetition as possible. In the locomotor task, two servo motors positioned a waterspout at different locations within an arena of 30×40 cm. Every 10 to 30 s a valve ejected a drop of water, which remained in the mesh until the rats consumed it. To prevent the rats from following the movements of the waterspout, we introduced dummy moves: First the waterspout was doing a dummy move without giving water. One second later it did move to a new position where it let out a water drop. The third and last move was again a dummy move. Even for an experienced animal, this procedure resulted in multiple water drops distributed across the mesh at any given time point. The fact that the rats did not collect all water drops indicates that the animals could not predict where the water was let out and had to actively search for it. This task required minimal training as indicated by the stable paw velocities over all sessions. Thus, we used all sessions for data analysis (**Supplementary Fig. 1A**).

In the joystick task, the animals had to learn to grab a joystick-like manipulator as a first step. The manipulator was based on a manipulandum for rodents 43. Instead of having to reach out for the joystick, the joystick was placed right below the right front paw. The naïve rats typically explored the arena in which the joystick was placed. As the animals placed the paw by chance on the joystick, the joystick vibrated and a liquid reward was given as long as three requirements were met: (1) The rats had to keep holding the joystick with the right front paw which we controlled for via force sensors on the joystick. (2) The left front paw had to be placed on a force sensor plate, which was placed to the left of the joystick. (3) The rats’ head had to cross an infrared sensor. This ensured that the animals had to learn to use their right front paw to manipulate the joystick rather than the left paw or the mouth. The vibration of the joystick was implemented by clamping the current of the two motors according to two independent Gaussian processes and served two purposes: (1) it made the animals aware of the joystick. (2) The vibration of the joystick increased in amplitude during the course of 10 s (the maximum vibration amplitude resulted in an average acceleration of 1.5m/s2) such that, unless the animals held the joystick firmly, it would lose the grip and thus not receive rewards. Together, these measures resulted in an automatic training by which the rats learned to hold the joystick during the maximum vibration amplitude within 10 sessions. Once the rats had developed a firm grip of the joystick, the motors were turned off and the rats received a reward when they actively moved the joystick. Moreover, the rats only received rewards when they moved in a direction or to a position which had not been visited recently (see below). The joystick could be moved within an arena of 40x40 mm. This arena was divided into 5×5 bins and the direction of movement was divided into 8 bins. For each bin we stored the amount of remaining reward. Whenever the rats visited one bin, the amount of remaining reward, r, in that bin was decreased to r- Dr. The amount of reward that was decreased, Dr, was distributed among all other bins. Thus, if the rats preferred one bin, the reward within that bin disappeared completely after 20 seconds. It took up to 15 sessions for the animals to start to move the joystick non-repetitively (**Supplementary Fig. 1B**). Before the rats started to move randomly, they typically tried to pull the joystick only in one direction (typically towards the rat). This resulted in minimal overall movements since the joystick was stopped by the edges of the arena (the 40x40 mm arena). Only when they realized that they could move in all different directions, the amount of total movement increased. For data analyses, we used data from sessions 15 to 35.

**Quantifying behavior**

Since the rats had to take a defined pose in the joystick task, we could relate the joystick position and movement to the egocentric coordinates of the rat. To enable a comparison of the locomotor task and the joystick task, it was necessary to quantify the behavioral variables in a similar way. To achieve an egocentric tracking in the locomotor task, we tracked the paws, head, chest, and belly of the animals. By using the head, chest, and belly coordinates, we aligned the movements of the right front paw to egocentric coordinates. The neck velocity was calculated from the head, chest and belly coordinates. Those body parts were tracked by painting them in different colors. The head of the rat did not have to be painted because of the black hood of Long Evans rats. To ensure that all body parts could be tracked, the cameras were placed below the arena. Two to four cameras (Stingray, F033C IRF CSM, Allied Vision Technologies) were used in the locomotor task. The noise of tracking was estimated to 0.79 cm/s (estimated when the paw was standing still on the mesh) and was subtracted from the paw velocity estimates.

**Data acquisition and preprocessing of extracellular recordings**

Extracellular signals were bandpass filtered, amplified and digitized using the INTAN (Intan Technologies, Los Angeles, California) head stage attached to the Mill-Max matrix connector at the head of the animals. To maximize comfort for the animals, we stripped the ultrathin INTAN cable and suspended it with a 1.5 m long ultralight spring with a 1.5 mm diameter. The long recording cable allowed the rats to move between the locomotor task and the joystick task without having to be disconnected and re-connected. The rats could either begin with the locomotor task and after 30 min a door was opened allowing the rats to walk into the joystick arena for 40 to 90 minutes, or the rats were in the joystick arena for the entire session. In case of a dual task session, we always began with the locomotor task, because the color markers used for the locomotor tracking faded over time.

The extracellular recordings were sampled at 30 kHz and were de-noised offline. First, 50 Hz and the corresponding harmonics were removed using a 20 ms template estimation. The activity across all channels was demeaned using a median filter. Spike sorting was conducted on high-pass filtered data with a cut off frequency of 300 Hz. Spike snippets were extracted from peak aligned events that crossed a threshold of four times the standard deviation. Only spikes with a negative peak were taken into account. The spike window was -0.5 to 2 ms around the peak amplitude (resulting in 76 values for each spike). To minimize the risk that a sorted unit was a combination of multiple neurons, we applied a conservative threshold for the cluster size. To this end we used a cluster size that was dictated by the noise level half a millisecond before the minimum of the spike. Given the typical refractory period of neurons, this noise estimate excluded variability caused by this unit and was therefore a direct measure of the cluster size of this particular unit. Since our electrodes typically had a spacing between 300 and 1000 µm, we sorted each electrode separately. The spikes were sorted in the raw 76 dimensional space without dimensional reduction. For each sorted unit, the spike sorting algorithm had two phases. First, the algorithm estimated a suitable seed spike. Second, the corresponding waveform was optimized iteratively until the spike assignments of that unit remained constant. The clustering algorithm selected a seed spike by calculating the average noise level across all units. Afterwards, it randomly chose one spike and counted the number of neighboring spikes within this average noise level. Those spikes were called the spike-neighborhood. This procedure was repeated for 500 randomly chosen spikes in order to maximize the chance of finding a globally optimal seed spike. The spike that had most neighbors was selected as the seed for a unit. In order to optimize this spike seed, the noise level for the neighboring spikes was recalculated, the new neighborhood was calculated given this new noise level, and the new average waveform was calculated. This procedure was repeated until the neighborhood remained constant. The spikes within the noise-defined neighborhood were considered to belong to one sorted unit. For this unit, the spike sorting was finished at this point and it was not considered for further spike sorting. For the remaining spikes, the algorithm re-started phase one and two in order to search the next sorted unit. This procedure was stopped when it resulted in sorted units with spike rates lower than 0.1 Hz.

We regarded a unit as a single unit when the number of spikes within an inter-spike interval of less than 2 ms corresponded to a smaller firing rate than the average firing rate of the unit. To define the degree of decorrelation across neurons, we used the m-rate 29. The m-rate denotes the minimum spike rate in the spike-triggered spike average between two neurons (cross correlogram). The cross correlogram was calculated over a period of -10 to 10 s with a 10 ms binning. We did not calculate the m-rate from a neuron to itself since that would reflect intra-neuronal processing (adaptation and refractory period) rather than the decorrelation of the population. The m-rate corresponds to the average spike rate if the spikes of the two neurons occur independently of each other, and the m-rate would be 0 for the case of a lag with no corresponding spike pairs. The m-rate percentage was calculated by dividing the m-rate with the average firing rate.

**Single and multiunit velocity modulation**

As a general way to relate behavior to neural activity on a single unit or multiunit level, we used a generalized form of spike triggered average of the paw velocity, which we denote as activity weighted distribution (AWD). First, instead of taking discrete spikes, we weighted the behavioral variable (paw velocity or position) with a continuous neuronal activity. Here this continuous activity was the instantaneous firing rate smoothed with a Gaussian kernel with a standard deviation of 50 ms. Second, instead of averaging the behavioral variable, we calculated the distribution for the behavioral variable. A distribution was formed by binning the complete velocity range into 10 equally sized bins. Each bin quantified the average activity across the velocity range of the corresponding bin (See Supplementary Figure 2). In contrast to the linear average in the classical spike triggered average, the distribution of the behavioral variable allowed us to take nonlinearities into account, e.g. exponentially increasing firing rates with linearly increasing velocity. According to a traditional spike-triggered average, the relation between neuronal activity and behavior was calculated at different temporal lags between neural activity and behavior. Here we used lags between -4 and 4 s with a temporal resolution of 10 ms. For large delays beyond 3 s, the neuron was typically no longer modulated by behavior. Here we used the average activity between 3 and 4 s to calculate a baseline activity. This baseline activity was subtracted from the AWD. The average velocity modulation at each lag was calculated by taking the mean of the absolute value of the subtracted AWD (**Fig. 1F and G**). The duration and the lag of the modulation was calculated by first extracting the peak modulation. Then we traced this modulation backward and forward in time until the modulation was less than 80% of the peak modulation. The temporal difference between those two time points was defined as the duration of the modulation (**Fig. 1H, 1I, 2B, 2C, and 2D**). The average between those time points was denoted as the temporal lag of the modulation. We took the average time of the 80% start and stop time since this resulted in a more accurate estimation than the peak time. This was due to the frequent occurrence of plateaus in the velocity modulation. During these plateaus, small fluctuation of the neuronal signal within the noise level can make the peak appear at any time point along the plateau. To determine if a unit was modulated by velocity, we calculated the mean and standard deviation of the velocity modulation at the two extreme lags of the normalized velocity modulation (-4 to -3 s and 3 to 4 s). The normalized velocity modulation was calculated by subtracting and dividing the velocity modulation with the mean and standard deviation, respectively. A unit was regarded as modulated if this velocity modulation was larger than 10 (a.u.). The variability of the velocity modulation was calculated by dividing the firing rate variance with the average firing rate in each bin of the lockup table that is used to calculate the velocity modulation (See Supplementary Figure 2). The normalized variability for each sorted unit was calculated by dividing with the variability at the baseline interval (-4 to -3 s and 3 to 4 s).

**Bootstrapping velocity modulation**

To estimate the variability of the modulation duration we used a bootstrap analysis (**Fig. 1H and I**). Since it would be computationally inefficient to sample from all 10 ms bins with replacement and since 2 neighboring 10 ms bins were not independent, we chose to divide each session into 100 segments of equal size and to calculate the AWD for each such segment. This resulted in segments that were at least 10 seconds long, allowing computationally effective bootstrap sampling. We sampled the corresponding 100 AWDs with replacement and calculated the resulting velocity modulation. This procedure was repeated 100 times. For each repetition, we calculated the modulation duration. Afterwards, we calculated the standard deviation across those repetitions.

**Population correlation analysis and trial definition**

The population correlation analysis was performed on normalized neural activity. For each unit, we divided the spike trains into 10 ms bins, subtracted the average firing rate and divided each bin by the standard deviation of the binned activity. This normalized data was organized into a matrix with as many rows as there were units and as many columns as there were time bins. To prepare the data for the correlation, we normalized each column to have an average of 0 and a Cartesian norm of 1 (unit length). Finally, we removed a global population activity that could otherwise bias the correlation analysis. During short periods of time (between 500 ms to 10 s) sometimes the animals suddenly froze (both in the joystick and the locomotor task) which resulted in a correlated population activity across the joystick and the locomotor task (average R=0.5). Since this activity was correlated across two fundamentally different tasks, it was more likely to reflect a global state change rather than a planning process, which in turn could bias the population correlation. Therefore, we minimized the contribution of this freezing related population activity, p, by correlating the population activity at each time bin, at, with the population activity, and subtracting the population activity according to this correlation: at – p(at*p), where * is the scalar product.

With this normalized activity, we calculated the scalar product (Pearson correlation coefficient) between two population vectors at 2 different time points (**Fig. 3C and D**). We only correlated population vectors within a trial. Since our behavioral data was not separated into defined trials, we constructed trials using the paw velocity. First, we filtered the paw velocity with a Gaussian kernel of 2 s full width half maximum (FWHM). To find trials for which a period of low behavioral activity was followed by a period of high behavioral activity, we divided each time point in the filtered velocity by each time point in the filtered velocity 2 s earlier. If this ratio was larger than 2 and if this ratio was a local maximum across time, this was regarded as the central time point of a trial. A trial was then defined as 8 s before and 8 s after this maximum. This resulted in 1601 bins of 10 ms in one trial. The correlation was calculated between all 1601×1601 pairs of time points within a trial. Finally, as the population vector at one reference time point was correlated with the population vector at all other time points, the correlation would decay with increasing distances from the reference time point. This decay was fitted by an exponential function using nonlinear optimization with a Gaussian cost function (**Fig. 3E, F, G and H**). This population correlation decay is a measure of the frequency characteristics of the population dynamics: one over this time constant defines the frequency for which a low pass filter with that time constant attenuates the amplitude to 16% of the original amplitude.

To estimate the behavioral frequency at each time point, we calculated the maximum behavioral frequency (within a window that was inversely proportional to the frequency) that was required for describing the behavior within the error bounds of the tracking.

**Behavioral impact on population correlation**

To test how well the neurons encoded for position (**Fig. S2B**), we divided the egocentric x and y movement coordinates of the right paw into five equally sized bins between the minimum and maximum position value. This resulted in a 5 x 5 element matrix. For each element in this matrix we calculated the average firing rate of the neuron when the paw was in the corresponding position within ±50 ms. We used this matrix as a lookup table to estimate the instantaneous firing rate at each 100 ms time bin, given the position at the corresponding time bin. The resulting time course of the firing rate was correlated to the time course of the true instantaneous firing rate binned in 100 ms bins. The same analysis sequence was conducted for x and y velocity.

**Subthreshold reconstruction**

The subthreshold reconstruction algorithm, SubLab, has been described in detail elsewhere 29. In short, the algorithm uses the spikes of one unit (target unit) to reconstruct its subthreshold activity by means of the spiking activity of the remaining units (input units). The algorithm differs from recent auto-encoders and dimension reduction techniques in three aspects: (1) it does not assume an even distribution of spikes in time (Poissonian or Gaussian models); (2) (subthreshold) activity is not modified, as long as it does not cross the threshold; (3) the algorithm reconstructs the subthreshold activity individually per neuron and, therefore, does not impose any relation between units. Here we used 10 training epochs and we ran the reconstruction on complete sessions.

We also tested the LFADS auto-encoder algorithm, since it does not require a trial structure and since it can fit complex dynamics to spiking data. For our data, LFADS smoothed the spike trains in a piecewise continuous way. We observed gaps in the smoothed spike trains. We suspect that these gaps were due to the spontaneous and complex behaviors, which in turn caused the internal states to be reset frequently.

The reconstructed activity was filtered in the following way (**Fig. 4C, D, E and F**). High pass filtering: First, the reconstructed signal was smoothed with a Gaussian kernel with a standard deviation (σ) of 0.14 s. Using the cut-off frequency formula for Gaussian filtering (2πσ)-1, this corresponds to a cut off frequency of 1.1 Hz. Second, we subtracted this smoothed signal from the original reconstructed signal. Band-pass filtering: First, the reconstructed signal was smoothed with a Gaussian kernel with a standard deviation of 0.057, 0.14, 0.28, 0.57, 1.4, 2.8, and 5.7 s (2.8, 1.1, 0.57, 0.28, 0.057, and 0.028 Hz), respectively. Second, we subtracted this smoothed signal from the original reconstructed signal. Third, the resulting signal was smoothed with a Gaussian kernel with a standard deviation of 0.014, 0.035, 0.071, 0.14, 0.35, 0.71, and 1.4 s (11, 4.5, 2.2, 1.1, 0.45, 0.22, and 0.11 Hz), respectively. Low pass filtering: The band-pass filtered signal that was filtered with a low-pass kernel of 0.71 seconds (0.22 Hz) and high-pass kernel of 2.8 seconds (0.057 Hz) was referred to as the low-pass filtered signal. The additional high pass filtering minimizes the influence from strong low frequency components. Finally, to get the energy of the filtered signal, we calculated the absolute value of the high-pass filtered signal.

**Relating population and frequency coding**

Output-null and output-potent coding has traditionally been studied during the planning and execution phase of an instructed delay tasks. Since our behavioral setting does not include a typical trial structure, we defined the planning and execution phase in terms of the lag between the paw velocity and the neuronal activity. To this end, the anterior-posterior paw velocity was multiplied with the neuronal activity for a sorted unit in a bin-wise manner for a given lag and averaged across all bins. Thus for a given lag, this approach will quantify how each neuron codes for the movement in a linear manner. We used temporal lags from -1 second to 1 second with 10 ms bins. The result is a N x 201 dimensional matrix for each session, where N is the number of sorted units. Dimension reduction to a 2x201 matrix was achieved by taking the largest two principal components. We defined the output potent space as a one-dimensional space covered by the vector between origo (0, 0) and the point in the two dimensional space (spanned by the first two principal components) at lag of -40 ms. This lag was defined by the notion that executional activity should have a small correlation to the planning activity, which in turn translates to the lag at which the correlation to the average activity between -1000 and -200 ms (putative planning activity) was smallest. We choose the upper limit to be -200 ms to minimize the bleeding into executional activity18,19. Since this definition of the lag for the output potent space is maximizing the separation of planning and executional activity it is maximizing the chance that the null and potent spaces will be found. Such a biased definition is justified here since the aim is not verify the null space theory but rather to see if it is related to the ratio of high and low frequencies of neuronal changes. The output null space was orthogonal to this output potent space. For each lag, we estimated which state the neuronal activity was closest to by means of the difference in magnitude: abs(output potent)-abs(output null). If this __state tuning__ was positive we regarded the neuronal activity to be in the output potent state and if it was negative we regarded the neuronal activity to be in the null space.

To test if the frequency coding can predict whether the neuronal activity is in the output potent or output null space, we assigned the frequency preference for each lag of a certain session. This was done by calculating the difference in magnitude: abs(amplitude of high frequency) – abs(amplitude of low frequency). If this __frequency tuning__ was positive, it means that the neuronal activity had a larger high frequency component and if it is negative it means that the neuronal activity had a larger low frequency component. We pooled all lags (across all sessions) that had a positive frequency (or negative frequency) tuning and calculated the resulting average state tuning.

**Behavioral quantification during optogenetic stimulation**

For optogenetic stimulation we used a 200 mm fiber implanted at 1 mm depth in the primary motor cortex of two rats (511 and 512) (AP=0.5, LM=2, and DV=1). The viral vector AAV5 carrying the construct hSyn-hChR2(H134R)-eYFP-WPREpA (UNC vector core, Chapel Hill, NC, USA), was injected at a depth of 1.5 mm with a volume of 1ml. Each stimulation trial lasted 10 seconds and the light intensity was sinusoidally modulated according to one of five frequencies: 0.1, 0.3, 1, 3, and 10 Hz with a peak power of 4-12mW at the fiber tip. Since the current that ChR2 can give rise to is smaller for low frequencies than for high frequencies, we compensated with a stronger light intensity for the lower frequencies 44. Each trial was randomly interleaved with 120 to 240 seconds.

To quantify the subtle paw movements that results from sinusoidal optogenetic stimulation (Fig. 6C), we first calculated the paw position using FreiPose 39. For a given trial we manually selected the camera with the clearest view of the right foot (the optogenetic stimulation was in the left hemisphere). The paw position for each frame was then projected to this camera and a 100x100 pixels window was cut out around this projected position. The optical flow was calculated for each pair of neighboring frames (opticalFlowHS object in Matlab). The paw position for both frames in this pair was taken according to the first frame. The vertical component of the optical flow was extracted since this is the major movement axis during stimulation. Finally, the optical flow was only sampled at pixels with a saturation above 20% (0.2 for saturation of the rgb2hsv function in Matlab). This was done in order to sample paw movements rather than more unspecific “fur” movements. Trials in which the rat was grooming, eating or walking was eliminated from further analysis. The amount of movement for each stimulation frequency was then calculated by averaging the energy in the 0.1, 0.3, 1, 3 and 10Hz band (using the spectrogram function in Matlab with window size of 100 and overlap 99).

In addition to the automatic behavioral quantification described in the previous paragraph, in figure 6D we manually quantified how the animal responded to the optogenetic stimulation. To this end we measured the duration for which the rat performed an abnormal behavior. Abnormal behavior was defined as a paw movement for which the rat was lifting and lowering the right paw towards the original location at least two times. We excluded movements that could be ascribed to walking, grooming or movements that showed a coordination between left and right paw. Although the criterion seemed robust, there was one trial in which rat 512 stretched out the paw abnormally for the 10 Hz stimulation and this was therefore not counted as a cyclic movement.

**Encoding and decoding**

Encoding and decoding of the paw velocity was done by multiplying a temporal kernel with the paw velocity (for encoding) or with the neuronal activity (for decoding). The temporal kernel spanned from -2 seconds to 2 seconds with a temporal resolution of 10 ms. The weights of the temporal kernel was optimized with a least-squares method in order to follow the behavior (for decoding) or the neuronal activity (for encoding).

For encoding the paw velocity in the X and the Y direction was used to predict the neuronal activity for each sorted unit. Thus, for the encoding the kernel was a matrix with Tx2 weights (2x401 samples, see above).

We did two types of decoding. For testing the frequency dependency of decoding performance we used a single sample kernel with the weight at time 0 relative to the behavior, or a double sample kernel with the two weights at -1/frequency and 0. The frequency corresponded to the center frequency of the bandpass filter neuronal activity. Thus, for the frequency dependent decoding the kernel was a matrix with 1xU weights, or a matrix with 2xU weights, where U corresponds to the number of units.

For extracting the unit specific decoding kernel we used one temporal kernel (401 samples, see above) for each paw velocity direction. The temporal kernel of one unit was optimized independently of the temporal kernel of another unit. Thus, for the unit specific decoding kernel each kernel was a matrix with Tx1 weights, where T corresponds to the number of weights in the temporal kernel (i.e. 401). The wavelet analysis of the decoding kernels was done using the Matlab command cwt with symmetry parameter 1.5 and time-bandwidth product 2.

**Statistical procedures**

All statistics and graphical illustrations of spiking unit data have been corrected for the possibility that the same unit has been recorded during multiple consecutive days (**Supplementary Table 3**). In motor cortex, evidence has been provided that tungsten electrodes are able to record the same unit for an average of three days, and a considerable amount (11%) of neurons could be recorded for up to seven days 45. Since we had a recording session almost every day we conservatively regarded every 7th unit to be an independent data sample. To this end, the degrees of freedom were calculated on the basis of the unit count divided by 7. We made this correction for the t-test, the Pearson correlation coefficient, and the ANOVA. For box plots (using Matlab’s boxplot function), we plotted the bootstrapped data (using Matlab’s bootstrap function with 1000 iterations) and adjusted the standard deviation of the bootstrapped data such that it was times that of the original data. In addition to this correction for independent data samples was also applied a mixed effect model. The modulation duration difference between cortical areas was also tested using a linear mixed effect model for which the areas were modelled with an additive random effect and the cortical location was modeled with a fixed effect. All responses from a given electrode were averaged across sessions for a given animal.

For statistical testing, we assumed that the data was normally distributed. The test statistics for the Pearson correlation coefficient, the ANOVA and unpaired statistics approached a normal distribution for large data samples. For the paired t-test, we assumed a normal distribution as the test distribution was symmetric around 0. Unless otherwise stated, samples were described as mean and standard deviation of the mean.

Since we had one less animal in the joystick task (animal 220 lost the implant before it learned the joystick task), all paired tests were done without animal 220 in both the joystick and locomotor task. The non-paired tests were done using all 6 animals in the locomotor task and all 5 animals in the joystick task.