Imaging and genetic data were acquired from infants recruited as part of the developing Human Connectome Project (dHCP) and released publicly as part of the 3rd neonatal data release31. The dHCP was conducted according to the principles of the Declaration of Helsinki and ethical approval was given from the UK National Research Ethics Service. Written parental consent was provided for all individuals.
2.1 Diffusion data acquisition
Participant selection is summarised in Fig. 1A. Of the total 783 individuals in the dHCP, only term born (born at least 37 weeks of gestational age) infants with available diffusion weighted imaging (DWI) data were selected (n = 467). This was to account for any possible effect of prematurity on WM development32. Multi-shell high angular diffusion imaging data were obtained using a protocol optimised for the neonatal brain as previously described33,34. Briefly, PGSE EPI sequence with multiband factor 4, TE/TR = 90/3800 ms, 1.5 x 1.5 x 1.5 mm resolution, SENSE 1.2, Partial Fourier 0.855 was used to yield 300 diffusion-weighted volumes per subject with four phase-encoding directions: b = 0 s/mm2 (n = 20), b = 400 sm/mm2 (n = 64), b = 1000 s/mm2 (n = 88) and b = 2600 s/mm2 (n = 128).
DWI processing was done as part of the dHCP pipeline31. Briefly, images were first denoised35 and corrected for Gibbs ringing artefacts36. Motion and image distortion were corrected using spherical harmonics and radial decomposition (SHARD)37 (n = 18 individuals excluded after failed SHARD reconstruction quality check and low radiology score (the images were scored by an expert radiologist from 1 to 5 with 5 denoting images of the lowest quality). Subsequently, inter-slice intensity variations were adjusted 38, and individual brain masks were then generated using FSL brain extraction tool (BET)39. Visual inspection of DWI and mask outputs was carried out by HL to remove any outliers with distorted brain masks (no outliers).
T2-weighted images were acquired using TSE sequence parameters TR = 12s, TE = 156 ms, and resolution 0.8 x 0.8 x 1.6. A series of motion correction40 and super-resolution reconstruction techniques41 were then employed to produce images of resolution (mm) 0.5 x 0.5 x 0.5. Subsequently, T2-weighted images were segmented with DrawEM neonatal segmentation algorithm42 as part of the dHCP pipeline43. Total brain volume (TBV) was calculated as the sum of the volumes of cortical white and grey matter, deep grey matter, cerebellum, and brain stem. The volumes were calculated as the total number of voxels multiplied by the voxel dimension.
2.2 Genetic data preprocessing
The genetic data quality control and preprocessing steps used in this study are described in Cullen et al.44 (Fig. 1A). Briefly, of the 842 saliva samples, genotype data were collected (Oragene DNA OG-250 kit) and genotyped for SNPs genome-wide on the Illumina Infinitum Omni5-4 v1.2 array. If multiple samples were provided, only one per individual was retained for analysis (randomly chosen). Excluded were also genotyped data based on the following criteria: completeness of less than 95%, gender discrepancy and genotyping failure of more than 1% of the SNPs. If the relatedness score between any individual pair was above a cut-off (pi_hat > = 0.1875), only one sample was randomly retained. Genotyped SNPs were filtered based on the following criteria: being non-autosomal, having minor allele frequency less than 0.05, missing in more than 1% of individuals or deviating from Hardy-Weinberg equilibrium with a p-value < 1 x 10− 5.
Subsequently, the dataset was imputed to the Haplotype Reference Consortium reference panel45 on the Michigan Imputation Server. The VCF files returned were converted to PLINK files using a genotype calling threshold of 0.9. The imputed SNPs were excluded based on the following criteria: having minor allele frequency less than 0.05, missing in more than 1% of individuals, deviating from Hardy-Weinberg equilibrium with a p-value < 1 x 10− 5, or having an imputation R2 value of less than 0.8. All quality control was performed with PLINK 1.9. This yielded 754 individuals with high-quality SNPs.
2.3 Population stratification
Ancestry subpopulations were identified by merging our cohort with 2504 individuals from the 1000 Genomes project46 using a subset of common autosomal SNPs. Principal component analysis (PCA) was then performed on the resulting genetic dataset with PLINK, and the resulting principal components (PC) were then used to visually assign infants to specific ancestral subpopulations. Since the discovery sample used to derive the genome-wide association study (GWAS) summary statistics included exclusively individuals of Danish ancestry, only infants of European ancestry were considered for the ongoing analysis. Here, 429 (57%) were determined to have European ancestry.
European ancestry PCs were then generated from the European infant subpopulation genetic data, and a visual examination of PC pairwise scatterplots was carried out to exclude ancestral group outliers (6 excluded). Of the remaining individuals, 221 individuals were term-born and had both imaging and genetic data available. (Table 1).
Table 1
Demography of the 221 infants studied in this study.
Sex (male/female) | 111/110 |
Gestational age at birth (week) | 40.1 ± 1.2 |
Postmenstrual age at scan (week) | 41.6 ± 1.7 |
Total Brain Volume (mm3) | 376314 ± 48579 |
2.4 Polygenic score calculation
Individual polygenic scores (PS) were calculated using PRSice-247 software and summary statistics derived from the largest to date autism GWAS25,48. Here, PS for each infant were estimated at 10 p-value thresholds (PT): 10− 8, 10− 6, 10− 5, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.5, and 1, such that each score was composed of only those SNPs with autism GWAS association p-value less than the respective threshold. Genotype data of European individuals in the 1000 Genomes project was used as the external linkage disequilibrium reference panel 46.
2.5 Fixel-based analysis preprocessing
The FBA framework21, adapted to the neonatal cohort as described in Lautarescu et al.49, was implemented in MRtrix3 v.3.0.450 to derive white matter phenotypes examined in this study. Here, single fibre white matter (WM) and cerebrospinal fluid (CSF) tissue-specific response functions were calculated for each subject using the ‘dhollander’ method51 with tissue separation fractional anisotropy (FA) threshold of 0.15. Response functions representative of CSF and WM diffusion signals at 44 weeks were calculated by averaging all CSF and 21 WM response functions from subjects aged 44.1 weeks, respectively. Subsequently, the average response functions were used to compute individual white matter fibre orientation distributions (FODs) map using the multi-shell multi-tissue constrained spherical harmonics method52 and the estimated FODs were intensity normalised53,54.
To generate the study-specific FOD template, individually normalised FODs and masks were first registered from native to the 40-week anatomical template55 using structural registration. The resulting warps were provided as part of the dHCP extended-release 331. Briefly, the warps were created from stepwise registrations as follows: subject DWI to subject T2-weighted image (FSL FLIRT using a normalised mutual information metric), subject T2-weighted image to age-matched T2-weighted template (using symmetric normalisation (SyN) algorithm from Advanced Normalisation Tools (ANTs)56 and nonlinear diffeomorphic multimodal registration of T2-weighted image and grey matter/white matter tissue probability maps) and week-to-week nonlinear transformations to the 40-week dHCP extended atlas (ANTs SyN). Consequently, the individual FODs and masks were transformed to the 40-week anatomical template (using the structural registered warps) using cubic and linear interpolation, respectively. The study-specific mask and FODs templates were then created by computing the intersection of all subject 40-week masks and the average of all subject 40-week WM FODs, respectively, and regrided to 1.3 isotropic voxel dimension. Finally, subject 40-week warped FODs images were registered to the study-specific FODs template (resulting in nonlinear warps mapping subject to template space) and then transformed to the template space without FOD reorientation (as previously explained in Raffelt et al.21).
2.6 Fixel-wise metrics
A fixel analysis mask refers to a fixel grid on which fixel-related metrics are mapped and statistical analysis carried out. Here, a study-specific fixel analysis mask was generated by directly segmenting study-specific WM FODs using a peak amplitude threshold of 0.0657.
The amplitude of the WM FOD, which is proportional to the radial diffusion signal, provides a measure of the volume fraction of the fibres, or apparent fibre density (FD), along the corresponding fibre orientation21. Subject-wise fixel analysis masks were generated from individual WM FOD maps (warped to template space but not reoriented) and fixel-wise FD was computed as the integral of the corresponding FOD lobe58. Subsequently, subject-wise fixels were reoriented using the local angular transformation obtained from the subject-to-template warp. To achieve effective comparison between individuals, each subject-wise fixel and its corresponding FD value were assigned to a unique fixel on the study-specific template using a spatial correspondence algorithm with a maximum angle threshold of 45°.
While fixel-wise FD measures local changes in density within a fixel, it does not capture volume changes across a fibre bundle as a whole, whose width could stretch over multiple fixels18. Similar to tensor-based morphometry, fixel-based morphometry utilises subject-to-template warp to identify volume changes. However, it specifically measures local volume differences across the bundle, perpendicular to the fixel orientation, excluding changes due to the fibre bundle length21. Here, fixel-wise fibre bundle cross-section (FC) was computed using the subject-to-template warp and study-specific fixel analysis mask, and FC > 1 and FC < 1 denoted local relative expansion and contraction, respectively.
Finally, fibre density and cross-section (FDC) was calculated as the fixel-wise product of FC and FD. This metric reflects the combined volume change manifested as either within-voxel FD, macroscopic FC or a combination of both21.
2.7 Atlas registration and mean tract metrics extraction
Neonatal WM tracts parcellations and the fibre orientation distribution function (ODF) template at 40 weeks were obtained from Uus et al. (https://gin.g-node.org/alenaullauus/4d_multi-channel_neonatal_brain_mri_atlas)59 (Fig. 2). Firstly, the study-specific FOD template was registered to the neonatal atlas ODF template at 40 weeks using FOD registration to obtain non-linear deformation warps. Secondly, a binary mask for each of the 54 WM tract parcellations was generated from the neonatal WM tract parcellation. Thirdly, the individual binary mask were transformed to the study-group space using the warps and linear interpolation, where voxels were assigned to the tract with the highest interpolated value. Fourthly, a fixel analysis mask for each WM tract was then created by assigning all fixels within the voxel of the tract. Finally, average fixel-wise FD, log(FC) and FDC values were generated for each WM tract label in each individual.
2.8 Whole-brain tractography
Whole brain anatomically constrained probabilistic tractography was performed on the study-specific average FOD map to produce 20 million streamlines60,61. To improve the biological plausibility of the tractography process, the tractogram was subsequently filtered using the SIFT algorithm to match streamline density to the diffusion signal, resulting in a reduced subset of 2 million streamlines57. The fixel template and the reduced subset of streamlines were then used to generate a fixel-to-fixel connectivity matrix. In turn, this allowed for the smoothing of fixel measures across connected fixels and improved the power of subsequent fixel-wise statistical analysis53.
2.9 Statistical analysis
In the first analysis, the association between autism PS and tract-mean FD, log(FC) and FDC was estimated using linear regression models. Here, dependent variables were the mean fixel-wise measures, and independent variables were sex, GA, PMA, first 3 ancestry PCs, and autism PS. For examinations of the mean log(FC) and FDC, TBV was also included as a covariate (i.e., X ~ sex + GA + PMA + AncPCs + autism PS + (TBV), where X is mean tract FD, log(FC) or FDC). The variance R2 explained by the autism PS was reported as the difference between the R2 of the full model with the autism PS included as a covariate and the R2 of the null model without the autism PS included as a covariate.
Since PS across the 10 examined PT were highly correlated, a method to utilise eigenvalue variance was used to calculate the effective number of independent tests performed62. The resulting number of independent tests for PS was 6. Given that FDC is a multiplication of FD and FC, the number of independent tests for all fixel-wise metrics was established as 2. Therefore, the multiple-comparison Bonferonni adjusted P-value threshold for examination of mean fixel metrics across 54 tracts was determined as 0.05/(54 x 6 x 2) = 7 x 10− 5.
In the second analysis, the same linear regression model was fitted at each fixel in the fixel analysis mask (i.e., for whole brain WM: X ~ sex + GA + PMA + AncPCs + autism PS + (TBV), where X is fixel-wise FD, log(FC) or FDC). Here, connectivity-based fixel enhancement (CFE) was employed to assess the association between autism PS and fixel measures. CFE employs a threshold-free cluster enhancement algorithm to increase test statistics to structurally connected fixels63,64. Contrast matrices were created to test for either negative or positive association between autism PS and fixel-wise measures. The design matrix was created by standardising the covariates with an added column of ones as intercept. The null distribution was generated by recording the maximum CFE statistics at each of the 5000 permutations and a fixel is considered nominally significant if the whole-brain two-tailed family-wise error (FWE) corrected p-value < 0.025 (0.05/2). Accounting for all PS PT and fixel-wise measures yielded the whole-brain multiple-comparison Bonferonni adjusted PFWE < 0.025/(6x2) = 0.002.
2.10 Exploratory gene-set enrichment analysis
Exploratory gene-set enrichment analysis was performed on the autism SNP subset most associated with any WM mean tract fixel metric. Here, SNPs contributing to the autism PS threshold most strongly associated with the imaging phenotype were selected. Linear regression was performed for each SNP (i.e., WM mean tract fixel measure ~ TBV + GA + PMA + sex + ancestry PCs) to determine its association with the WM phenotype. Subsequently, the SNPs with p-value < 0.05 were retained for further analysis, as they were reasoned to be associated with both autism and the WM phenotype. To determine whether the SNPs converged on a common biological pathway, genes containing those SNPs were examined for their functions. Here, each SNP was mapped to a single gene if its base pair location was found within the start and stop coordinates of the gene according to the human genome build 37 (https://ctg.cncr.nl/software/magma). Consequently, the resulting gene list was functionally tested against a curated database of 13 159 gene sets (pathways) obtained from MSigDB v.7.5.1 (curated canonical pathways from Reactome, KEGG, Wikipathways and Gene Ontology; https://www.gsea-msigdb.org/gsea/msigdb/). A pathway was considered enriched (overrepresented) if the probability of observing the pathway-related genes in the user input gene list was different from random chance65. Hypergeometric test as implemented in GENE2FUNC on the FUMA platform66 (https://fuma.ctglab.nl/gene2func) was carried out using 19 427 genes from human genome Build 37 as background genes. Enriched pathways (Bonferroni corrected p-value < 0.05/13 159 = 3.79 10− 6) with at least 4 overlapping genes with the gene list were reported. To ensure consistency and reproducibility of the results, two additional tests were carried out. In the first test, we randomly selected from the SNP subset that contributed to autism PS PT = 0.01 the same number of SNPs that were associated with the WM phenotype of interest and performed the hypergeometric method. This test was simulated 1000 times, and the most enriched pathway in each run was recorded. Pathways specific to the WM phenotype were those that were observed in less than 5% of all random experiments. In the second test, we repeated our analysis with other similar bioinformatic tools, including DAVID v.2021 (http://david.abcc.ncifcrf.gov/) 65,67 and WebGestalt v. 2019 (http://www.webgestalt.org/)68 to confirm the FUMA findings.