Human verbal conversations can be considered in two phases: reception and expression. Reception happens in two stages: initially, each sound in the heard sentence is recognized and decoded via phonological processing, followed by integration with higher-order functions to understand the semantic meaning1–5. The second phase, expression, also consists of two parts. Semantically coherent words are retrieved, followed by overt speech production and monitoring of one’s own speech5–8. This complex, the multi-stage process is believed to be supported by a broad cortical network incorporating a combination of local processing and long-range inter-lobar connections. The precise spatial configuration and temporal dynamics of the language network are still active research, and multiple competing models exist2–4,9,10. Common among each model is evidence that long-range white matter tracts (e.g., the arcuate and uncinate fasciculi) from and toward the temporal lobe play a central role in the language network. These inter-lobar pathways are suggested to exchange language information, including mental representation of sentences, and facilitate the translation of sound and semantics into motor representations9–15. However, previous studies have not provided direct quantitative evidence of how much, how fast, via what fasciculus, and in which direction given inter-lobar fasciculi can transfer neural information between language areas supporting phonologic and semantic processes. Collective evidence also indicates that the development of inter-lobar connectivity networks begins early in life and continues in an experience-dependent manner, possibly beyond adolescence16,17. Investigators have not reached a consensus on the developmental patterns of effective connectivity between the temporal and extratemporal lobe regions supporting different linguistic processing stages.
This multimodality study aimed to localize, quantify, and visualize the strength and dynamics of effective connectivity networks that support direct transfers of neural activity between the temporal and extratemporal lobe neocortices supporting distinct linguistic processing stages. Investigators have proposed several models for how the cerebral cortex transfers neural representations of phonological and semantic information. The classical Geschwind model of language-related connectivity did not distinguish separate pathways supporting the phonological and semantic domains18. Instead, it proposed that the left posterior temporal lobe neocortex transfers the entire linguistic content expressed by a given spoken sentence to the left inferior frontal gyrus (IFG) for subsequent speech output. Conversely, most current models propose that neural activity representing phonological and semantic domains is transferred via different pathways and bihemispheric involvement, though with left-hemispheric dominance2–4,10. Some models suggest that the human brain transfers neural representations directly from the posterior temporal to the extratemporal lobe language areas, mainly via the arcuate fasciculus3,4,10,11. Another model infers that auditory language information is transferred indirectly via another structure in the temporoparietal junction2. Another suggests that auditory semantic processing is initially supported by the neural information transfer from the left posterior to the anterior temporal area, which in turn projects to the inferior frontal region, presumably via the uncinate fasciculus9; subsequent semantic processing to understand auditory sentences is proposed to be supported by neural transfer from the inferior frontal to the posterior temporal region mainly via the arcuate fasiculus9.
The sound processing network is often referred to as the ‘phonological loop.’ It is localized mainly in the inferior Rolandic area, superior temporal gyrus (STG), and the surrounding regions of both hemispheres19–23. The ‘phonological loop’ is believed to support the transformation of perceptual to motor representations of spoken sounds and the monitoring of the volume/pitch of one’s own spoken sounds7,8,12,19.
The semantic processing network comprises both temporal and extratemporal lobe regions, with left-hemispheric dominance, including the left IFG and inferior parietal region24–27. Linguistic processing is thought to be in part sequential5,28. Phonological process of an auditory input can begin once auditory information is received by the primary auditory cortex, followed by semantic understanding processes that continue until after the auditory stimulus offset. Following processing of a heard linguistic input, semantic retrieval begins before an individual starts preparing an overt response. Although to some degree each of these processes overlaps during verbal conversations28, the linear nature of language processing suggests that late neuronal engagement between the offset of auditory sentence stimuli and the onset of overt responses during an auditory naming task reflects, at least in part, semantic understanding and retrieval5,29. Interventional evidence from an intracranial study supports this notion; high-frequency electrical stimulation of left-hemispheric regions showing such late neuronal engagement often results in a transient impairment of semantic retrieval with intact phonological repetition5.
Electrocorticography (ECoG) recordings during cognitive tasks can provide detailed estimations of the spatiotemporal distribution of corresponding brain activity. In particular, event-related augmentation of broadband activity, including the high-gamma range at 70–110 Hz, has been demonstrated to be an excellent surrogate marker for neural activation5,30−33. An increase in high-gamma amplitude is associated with increased firing rate on a single-neuron recording, hemodynamic response on functional MRI, and cortical metabolism on glucose-metabolism positron emission tomography34–36. The relevance of event-related high-gamma augmentation to language behavior has also been demonstrated. Naming-related high-gamma augmentation is capable of accurately predicting the language areas defined by electrical stimulation mapping (ESM) as well as postoperative language impairment37,38.
To quantify inter-lobar effective connectivity, we measured cortico-cortical spectral responses (CCSRs) elicited by weak single-pulse electrical stimulation at two adjacent electrode contacts39. While task-related high-gamma connectivity measures can demonstrate statistical covariation, they do not provide causal evidence of direct connectivity. Both CCSRs and cortico-cortical evoked potentials (CCEPs) are powerful tools to quantify the strength and dynamics of effective connectivity in each direction between two remote regions39–43. The two measures generally reflect the same underlying neuronal process but highlight different aspects of the recorded signal. CCSRs comprise a summation of phase-locked and asynchronous responses, whereas CCEPs consist of phase-locked responses alone. CCSRs are agnostic to the polarity of cortical responses, whereas CCEPs may exhibit a variable polarity of responses depending on the structural feature of the underlying cortex41. Previous studies suggest that early CCSRs roughly correspond to the early CCEP component (also known as N1), thus reflecting cortical excitation elicited by single-axonal neural propagation from the stimulus site39,41,44−47. This notion is supported by the observation that the peak latency of such early responses is correlated with the surface distance as well as the underlying white matter streamline length on diffusion-weighted imaging (DWI) tractography48–50. Late, low-frequency band CCSRs, roughly corresponding to the late CCEP component (also known as N2), are suggested to mainly indicate post-excitatory neuronal inhibition39,44,46 and represent an indirect measure of inhibitory networks that govern effective connectivity.
The present study utilized a novel multimodal approach, integrating ECoG high-gamma augmentation during an auditory naming task with CCSRs and DWI-based tractography to identify the strength, direction, and anatomical pathway of networks that support each stage of language processing. In this study, we sought to use the new perspective provided by an integrated, multimodal approach to test the spatiotemporal predictions of each proposed phonological and semantic network model19–23. To that end, for each stage of language processing, we quantified the strength and dynamics of the inter-lobar effective connectivity toward and from the temporal lobe and localized the corresponding white matter pathways. Auditory language-related cortical activity was thus measured using the dynamics of high-gamma amplitude modulations at each electrode site during an auditory naming task5. We then estimated the relative involvement of cortical areas in each linguistic processing stage with a principal component analysis (PCA) of the temporal variations in high-gamma amplitude modulations51.
Multimodal studies present a unique challenge for intuitive visualization and interpretation of results, which are often high-dimensional. Here, the spatiotemporal dynamics are represented in six dimensions: CCSR magnitude, CCSR dynamics, naming task-related high-gamma activity (HGA), and three spatial dimensions. To facilitate interpretation, we generated animations visualizing the strength and dynamics of effective connectivity (i.e., CCSR-based neural propagations) via white-matter tracts directly bridging cortical sites supporting specific linguistic processing stages. We refer to the multimodal analysis and corresponding animation-based atlas presented here as six-dimensional (6D) dynamic tractography. We expected that our 6D dynamic tractography would validate or revise existing neurobiological models of language organization and development2–4,9−11. We tested the specific hypothesis that extratemporal lobe sites supporting semantic retrieval would have direct effective connectivity toward and from temporal lobe sites supporting the same linguistic stage more robustly than phonological processing. We also quantitatively determined what specific fasciculi would support direct inter-lobar effective connectivity between given language sites in each direction. In this study of participants of age ranging between 5 and 20 years, we hypothesized that older age would increase the strength of inter-lobar effective connectivity specifically between those supporting the semantic rather than the phonological process.