Design
The pediatric concussion data were collected as part of a single-center, prospective cohort study on which the senior author was one of the lead investigators. The control data were retrieved through an open-source neuroimaging database, namely the Autism Brain Imaging Data Exchange II (ABIDE-II) repository. Open-source control data were used to increase our sample size. This study (and in particular the parent, prospective study on concussion) was approved by the Hamilton Integrated Research Ethics Board (HiREB).
Details on the following sections are reported in our prior publication on sex-effects in resting state brain activity in pediatric concussion (31). However, a brief description of each section is provided below, in addition to a more fulsome analysis discussion (which departs from the manuscript cited above).
Participants
The pediatric concussion participants (aged 9-17, n = 28) were prospectively recruited by our clinical team at McMaster University and affiliated sites, such as McMaster Children’s Hospital. All participants received a diagnosis of concussion by a clinical team member with expertise in the area, and they were (on average) one-month post-injury at the time of scanning. Most children had a sport-related concussion. Exclusion criteria included: more severe brain injury, multiple-systems injuries involving the head, and prior neurological history.
The healthy control data were obtained through ABIDE-II (n = 379), which contains >500 healthy control rsfMRI scans collected across 19 sites using 3.0T scanners (32). All possible pediatric rsfMRI scans that matched the age range of our clinical group were considered for inclusion. The scans included in our analysis were those which met quality assurance criteria (with respect to signal-to-noise ratio, percentage of outlier scans, and data smoothness). Further, we only included a scan from ABIDE-II if it included ≥180 timepoints.
MRI acquisition
Concussion subjects were all scanned at the Imaging Research Centre at St. Joseph’s Healthcare, Hamilton. All participants were scanned using the same hardware, namely a 3.0T GE Discovery MR750 scanner and a 32-channel phased array receive coil. The data acquisition (detailed in full in our prior report (31)), but in brief involved: a 3-plane localizer; 3D inversion recovery-prepped SPGR T1-weighted sequence TR/TE = 11.36/4.25 ms, TI = 450mms, flip angle = 12°, 512 × 256 matrix interpolated to 512 × 512, 22 cm axial field of view, 1 mm thick); rs-fMRI (gradient echo EPI, TR/TE = 2000/35 ms, flip angle = 90°, 64 × 64 matrix, 180 time points, 3 mm thick, 22 cm field of view). For all concussion subjects, a B0 map (in Hz) was acquired (following the geometric prescription of the rsfMRI) using a tool available on the scanner. The clinical data were B0 fieldmap corrected using the `epiunwap` package (33). The control data were downloaded (assuming they met the quality assurance checks, as above) and entered into our pre-processing pipeline, as per the following section. The control data did not include B0 fieldmaps, and we were therefore unable to perform such correction on those. However, analyses from our lab suggests that while B0 correction is ideal, lack thereof may not compromise the integrity of analyses using large open-source neuroimaging samples (34).
rsfMRI pre-processing
All pre-processing was performed in CONN 21a, which uses functionality of SPM12 and MATLAB. The pre-processing pipeline included the following steps: functional realignment and co-registration to the anatomical data, warping into MNI space (using 1mm and 2mm isotropic voxels for anatomical and functional data, respectively), slice-timing correction, outlier detection using SPM’s Artifact Detection Tool, segmentation and normalization, spatial smoothing (8 mm FWHM). De-noising was then performed, using CONN’s anatomical component-based noise correction procedure (aCompCorr). This involved projecting out noise associated with subject motion, outlier scans, white matter, cerebrospinal fluid. Temporal filtering was then performed to remove frequencies outside of the 0.008-0.1Hz range. Cortical and sub-cortical anatomical regions of interest (ROIs) were extracted using the Harvard-Oxford atlas, included in the CONN installation, which identifies 164 anatomical ROIs.
The BOLD signal for each subject and each region of interest (ROI) was extracted from CONN after pre-processing. All metrics of interest (mean, standard deviation, sample entropy, Lyapunov exponent, ALFF, fALFF; see Table 1 for more details) were computed in MATLAB on the extracted BOLD signals. More specifically, mean, standard deviation, and Lyapunov exponent for each time series were calculated using MATLAB’s mean(), std(), and Lyapunov Exponent with fs = 0.5Hz (i.e. the sampling frequency of the data) functions, respectively. A sample entropy algorithm (35,36) was written in MATLAB and applied to the BOLD signals with dim = 3, and r = 0.6 (values taken from (35) as that work was optimized for rsfMRI analysis, where dim is the rasterizing window size and r is the maximum threshold entropic distance a data point in the window can be from other data points in the window) (37). ALFF and fALFF values were extracted from CONN.
Metric
|
Description
|
Interpretation in rs-fMRI
|
Mean
|
The average of the BOLD signal over the course of the timeseries
|
The average % change in BOLD response over the time series.
|
Standard Deviation
|
The dispersion of the BOLD signal around the mean
|
Representation of the variation of % change in BOLD response from the mean throughout the time series.
|
Sample Entropy (SampEn) (37)
|
Generally defined as the lack of order/predictability in a system.
|
A measure of disorder across the BOLD time series, i.e. is the signal predictable or is it “random”. Can also describe the complexity of the signal.
|
Lyapunov Exponent (LyaExp) (38)
|
Measures the rate of divergence of trajectories in phase space. Can be used a measure of chaos in the system.
|
Describes how chaotic the BOLD signal is, i.e. how dependant future time points are on the present response.
|
Amplitude of Low-Frequency Fluctuations (ALFF) (39)
|
The amount of low-frequency components (0.001-0.01Hz) in a temporal signal’s frequency spectra.
|
Low-frequency fluctuations of the BOLD response are thought to correlate with spontaneous neuronal activities.
|
Fractional ALFF (fALFF) (39)
|
A normalized version of ALFF, where the amount low-frequency components are divided by the total number of frequency components.
|
Similar to ALFF, however due to the normalization fALFF is thought to provide a more robust measure for group analysis, however at the expense of sensitivity at the individual level.
|
Data analysis
Random forest models were implemented in Python (version 3) using the Scikit-Learn (40) and Imbalanced-Learn (41) libraries. To prepare the data for random forest analyses, the dataset was split into an 80/20 ratio, reserving 20% of the data for a final test. The remaining 80% was then further split into a 75/25 ratio, reserving 25% for a validation set. This creates an overall split of the data as 60/20/20. Of note is the data imbalance. All divisions were made preserving the original ratio of concussion to controls as much as possible, by performing random splits within each pool of data (i.e., concussion and healthy).
Models were trained using the 60% (n = 209) training set and evaluated on the 20% validation set (n = 81) to search for optimal parameters. The optimal parameters changed slightly depending on the metric being evaluated. The primary model parameter to be tuned was the number of estimators. The weighting was set to a 15:1 ratio to mirror the approximate ratio of the data imbalance. The decision trees were trained on the basis of maximizing the Gini Gain (with “gini” selected as the criterion upon which split quality was assessed in `scikit-learn`). Once the optimal parameters were determined, a finalized model was trained on the combined training and validation sets.
Performance was assessed on the final test set (n = 81) and input features were ranked and weighted by importance. After the feature ranking, the bottom 20% weighted features were removed, and this process was iteratively repeated until less than 10 features remained. This was then repeated for all six metrics and one more time to train an overall classifier using features from all metrics.