TemplateFlow: a community archive of imaging templates and atlases for improved consistency in neuroimaging

8 ​ Department of Otolaryngology, Harvard Medical School, Boston, MA, USA; ​ 9 ​ Department of Radiology, University Hospital of Lausanne and University of Lausanne, Lausanne, Switzerland. 📨 ​ Correspondence: RC < ​ rastko@stanford.edu ​ >, OE < ​ phd@oscaresteban.es ​ > Neuroimaging templates and corresponding atlases play a central role in experimental workflows and are the foundation for reporting standardised results. The proliferation of templates and atlases is one relevant source of methodological variability across studies, which has been recently brought to attention as an important challenge to reproducibility in neuroscience. Unclear nomenclature, an overabundance of template variants and options, inadequate provenance tracking and maintenance, and poor concordance between atlases introduce further unreliability into reported results. We introduce TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ, a cloud-based repository of human and nonhuman imaging templates paired with a client application for programmatically accessing resources. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is designed to be extensible, providing a transparent pathway for researchers to contribute and vet templates and their associated atlases. Following software


Introduction
Brains are morphologically variable, exhibiting diversity in such features as overall size 1 , sulcal curvature 2 , and functional topology 3,4 . Morphological variability manifests not only in differences between brains but also in the way that a brain changes across its lifespan, as it is remodelled by development, ageing, and degenerative processes [5][6][7] . These morphological differences often correspond with the effects of interest in neuroimaging studies and hinder direct spatial comparisons between brain maps 8 . The substantial variability within and between individual brains necessitates a means of formalising population-level knowledge about brain anatomy and function. Neuroscientists have answered this need by creating brain atlases as references for understanding and contextualising morphological variability. Atlases encapsulate landmarks, features, and other kinds of knowledge about the brain as annotations that are consistent across individual brains.
The development of atlases in neuroscience has accelerated knowledge discovery and dissemination. Early endeavours, epitomised by the groundbreaking work of Brodmann, leveraged careful scrutiny of microanatomy and cytoarchitectonic properties in small numbers of brains 9,10 . Concurrent macroanatomical approaches, by contrast, identified common features in nuclear boundaries and cortical gyrification. Modern atlases advanced on these approaches by incorporating stereotaxy 11 , defining a basis set of coordinate axes over the brain and anchoring neural landmarks to coordinates. Initially developed over a century ago to surgically induce targeted brain lesions, the first stereotaxic apparatus informed early sectional atlases of the cat and macaque brains 11 . In humans, Talairach's assiduous postmortem examination of a single human brain produced a stereotaxic atlas that saw wide use 12 . Since then, neuroscientists have directed great efforts to improve existing 13 and generate new atlases of the neurotypical adult human 14 and nonhuman 15,16 brain; as well as developing, ageing, and neurologically atypical brains. For instance, new atlases and representative stereotaxic maps can be created for diseased 17 , infant [18][19][20] , and elderly 21 human populations or to capture the rapid postnatal development of nonhuman species 22,23 . Recently, atlasing endeavours have largely shifted away from the search for a single universal neuroanatomical pattern, instead making use of increasingly large samples with the aim of representing a population average of the distribution of morphological patterns.
On account of its relatively high spatial resolution, its capacity to image the entire brain, and its non-invasive acquisition protocols, magnetic resonance imaging (MRI) has revolutionised neuroscience in general and the atlasing endeavour 24 in particular. For three cardinal reasons, atlases have become an indispensable component of modern neuroimaging data workflows. First, group inference in neuroimaging studies requires that individuals' features are aligned into a common spatial frame of reference where their location can be called standard 8 . Second, software instruments' progress to map homologous features between subjects 25,26 has enabled researchers to create population-average maps of a particular image modality with relative ease using commonly available software. These maps, called templates , are typically created by averaging images that are representative of the population of interest to a study 27,28 . Concomitant advances in image acquisition, processing, and analysis have enabled new templates to build iteratively upon previous work 24 . Third, templates engender a stereotaxic coordinate system in which atlases can be delineated or projected. Associating atlases with template coordinates also facilitates the mapping of prior population-level knowledge about the brain into images of individual subjects' brains (for instance, to sample and average the functional MRI signal indexed by the regions defined in an atlas 29 ).
Because they are integral to analytic workflows and because atlasing technology is continuously improving, a multiplicity of brain templates and atlases have been published 28 . Factors that have stimulated the proliferation of templates include distribution within software toolboxes, data structuring conventions, maintenance, and issues with licensing and shareability. As a result, researchers have at their disposal a wealth of open-access templates and atlases as well as established protocols for the creation of study-specific alternatives. Recent research 30 has cautioned that a naive more-is-better assumption for methodological options lends itself to greater methodological flexibility, which can threaten the reproducibility of findings. This problem is evidently not new for the neuroimaging community 31 , and the proliferation of templates and atlases adds on to the methodological degrees of freedom available to researchers. In the particular case of templates and atlases, the problem is exacerbated by poor consistency across available alternatives, as exposed by Bohland et al. 32 . Along similar lines, Yoon et al. cautioned about "template effects" confounding the interpretation of results from pediatric imaging studies based on a common adult reference template 27 . The concerning picture is completed with inaccurate reporting 33 , as it is often difficult (if not impossible) to map reports in literature back to the actual template or atlas employed in the report. Most prominently, the ubiquitous reporting of results (e.g., peak coordinates) " in the MNI (Montreal Neurological Institute) standard space " is inadequate because there is a large portfolio of templates 6,19,24,34 developed and distributed by the MNI. Moreover, the widespread FSL toolbox 35 references its results to an " MNI space " neither officially created nor distributed by the MNI.
To address the need for a centralized resource for the archiving and redistribution of templates and atlases that allows programmatic access to human and nonhuman imaging templates 36 , we have developed TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ ( Figure 1) . The resource is envisioned to support the emergence of processing and analysis workflows 37,38 that brain mapping is witnessing, while addressing the above concerns threatening reproducibility. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is modular, and both data and software are version-controlled. The resource allows researchers to use templates "off-the-shelf" and share new ones. An online documentation hub provides further details and facilitates use (http://www.templateflow.org).

Results
The ambiguity of MNI space reporting, and the case for a centralised repository with uniform nomenclature.
The lack of consistent template nomenclature introduces ambiguity in scientific communication, which is further biased by a researcher's choice of software library. Perhaps most notably, the majority of neuroimaging literature reports results spatially normalized and given in the Montreal Neurological Institute (MNI) standard coordinates . However, this statement is imprecise, as MNI offers a wide portfolio of MRI templates 19,24,34,[39][40][41][42]  To investigate the heterogeneous use of MNI templates in the neuroimaging literature, we performed an exploratory text mining analysis. Across the entire corpus of articles published in two leading methodological journals ( NeuroImage and NeuroImage: Clinical ), we identified 6,048 articles containing 14,870 sentences that included the term MNI. After preprocessing the sentence text, we used latent Dirichlet allocation 43 to create a topic model of the surveyed articles. A qualitative inspection of topic-word loadings suggested that of the 12 topics we identified, three provided insight into the likely provenance of the templates that an article used ( Figure 2). Two of these corresponded to two software packages widely used in neuroimaging-SPM (topic 6) and FSL (topic 10)-each of which is distributed with particular versions of the MNI template. The third (topic 4) related to the International Consortium for Brain Mapping (ICBM) and the MNI itself, the institutions that oversee the creation and curation of the MNI template portfolio. To demonstrate the heterogeneity of MNI template usage, we sorted articles according to their dominant topic (i.e., the topic with the highest model score in MNI-related sentences; Figure 3). We found that hundreds of articles featured each of the three provenance-related topics we had previously identified, underscoring that "MNI template" can refer to any of a family of templates and is not a unique identifier. As a matter of fact, studies carried out with SPM96 44 and earlier versions report their results "in MNI space" with reference to the single-subject Colin 27 average template 42 . However, beginning with SPM99, SPM updated its definition of "MNI space" to the template that MNI released in 2001: an average of 152 subjects from the ICBM database, aligned by means of linear registration. In SPM12 (the latest release at the time of writing), the meaning of "MNI" varies by submodule: different modules alternately use the Linear MNI152 template and a new, nonlinear revision from 2009. By contrast, the "MNI space" template bundled with the FSL toolbox was developed by Dr. A. Janke in collaboration with MNI researchers 24 . Although it was generated under the guidance of and using the techniques of the 2006 release of nonlinear MNI templates, this template is not in fact part of the official portfolio distributed by MNI. Nonetheless, our results suggest that the MNI templates bundled with SPM and FSL have historically gained broader currency as a result of the widespread use of these software libraries.
Although our results are only a first approximation, and although they do not provide insight into the provenance of the majority of MNI templates, they present several important cases for consideration. First, absent an unambiguous reporting nomenclature (such as Research Resource Identifiers 45 ; RRIDs), the widespread use of ambiguous terms like "MNI" presents a potential barrier for reproducing results and increases the chance of misapplying coordinates or references from an incorrect space. Second, absent a readily accessible centralised repository, researchers might often default to templates that are easy to access, many of which are tied to specific software packages. Third, as illustrated by the changing definition of "MNI" in SPM, template references can change as new technologies emerge, suggesting an essential need for version control systems. Finally, our analysis illustrates a fourth requirement for unambiguous reporting of results: a consensus regarding the minimally sufficient provenance-related information to report in studies and to distribute with templates. We sought to develop TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ with these features in mind.

A version-controlled archive of neuroimaging templates maximizing the accuracy in reporting spatially standardized results.
TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is a cloud-based repository of human and nonhuman imaging templates paired with a Python-based client for programmatically accessing template resources. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ addresses the need for a standard, centralised repository of templates and corresponding atlases and metadata. The TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ has a tree-directory structure, metadata files, and data files following a standard inspired by the Brain Imaging Data Structure 46 (BIDS). BIDS is a widespread standard that balances the needs for human-and machine-readability. BIDS prescribes a file naming scheme comprising a series of key-value pairs (called "entities") that are ordered hierarchically.
The most salient entity is the template identifier (signified with the key tpl-), whose value is an alphanumeric label that is unique across the Aʀᴄʜɪᴠᴇ (e.g., tpl-MNI152Lin ). Table 1 enumerates several templates currently distributed with the Aʀᴄʜɪᴠᴇ, and their corresponding unique identifiers. The unique identifier resolves the issue of inaccurate reporting, as it unambiguously designates one specific template. In addition, because the repository is versioned, researchers can easily retrieve and report the exact version of the template or atlas that was used in their study. Supplementary Table S1 summarizes the available entities and shows a segment of the file organization of the Aʀᴄʜɪᴠᴇ. For each template, the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ database includes one or more reference volumetric template images (e.g., one T1-weighted and one T2-weighted average map; all must be in register), a set of atlas labels and voxelwise annotations defined with reference to the template image, and additional files containing the template and atlas metadata. Figure 4 summarizes the data types and metadata that can be stored in the Aʀᴄʜɪᴠᴇ. Figure 5 provides an overview of the Aʀᴄʜɪᴠᴇ's metadata specification.
Cloud storage for the Aʀᴄʜɪᴠᴇ is supported by the Open Science Framework (osf.io) and Amazon's Simple Storage Service (S3). Version control, replication, and synchronisation of template resources across filesystems is managed with DᴀᴛᴀLᴀᴅ 47 .

Table 1. Digital templates included in TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ.
TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is designed to maximise the discoverability and accessibility of new templates, minimise redundancies in template creation, and promote standardisation of processing workflows. To enhance visibility of existing templates, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ includes a web-based browser indexing all files in the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ ( templateflow.org/browse/ ).

MNI152Lin 34,39
Neurotypical adult human template created as the average from a linear mapping of 152 subjects from the MNI cohort of the ICBM registered to the earlier MNI305 template.

MNI152NLin6Asym 24
FSL's version of the MNI152 neurotypical adult human template created using iterative nonlinear registration and averaging.

MNI152NLin2009cAsym 19,40
Update of the MNI152 neurotypical adult template with nonlinear registration. The mapping and averaging proceeded over 40 iterations beginning from the earlier MNI152 template.

MNIInfant 19
Series of human infant templates created from 11 cohorts of infants and young children. Each cohort spans a different age range between 0 and 60 months.

MNIPediatricAsym 19,40
Series of human pediatric templates created from 6 partially overlapping cohorts of children and young adults. Each cohort spans a different age range between 4.5 and 18.5 years.

NKI 14,48
Template created for the NKI-Rockland sample using ANTs diffeomorphic registration and averaging.

OASIS30ANTs 14,49
Template created using ANTs diffeomorphic registration and averaging for the Open Access Series of Imaging Studies (OASIS).

PNC 50
Pediatric and young adult template created using ANTs diffeomorphic registration and averaging for the Philadelphia Neurodevelopmental Cohort.

UNCInfant 20
Series of human infant templates created from a 95-subject longitudinal sample comprising three scans: as neonates, as one-year-olds, and as two-year-olds.

WHS 51-54
Waxholm space template created as an atlas of the Sprague-Dawley rat brain.

Fischer344 55
Rat template created as the average of 41 four-month-old animals from the Fischer 344 strain.
fsaverage 57 Surface-based average Freesurfer template.  A key neuroimaging resource developed with the best software engineering standards and easily operable by machines.
TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ's Python client provides human users and software tools with reliable and programmatic access to the archive. The client can be integrated seamlessly into image processing workflows to handle requests for template resources on the fly. It features an intuitive application programming interface (API) that can query the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ for specific files ( Figure 6). The BIDS-inspired organization enables easy integration of tools and infrastructure designed for BIDS (e.g., the Python client uses PyBIDS 58 to implement the queries listed in Table S1).
To query TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ, a user can submit a list of arguments corresponding to the BIDS-like key-value pairs in each entity's file name (e.g., atlas=Schaefer2018 to return files containing voxelwise annotations from the 2018 Schaefer atlas 59 ).
To integrate template resources into neuroimaging workflows, traditional approaches required deploying an oftentimes voluminous tree of prepackaged data to the filesystem. By contrast, the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ client implements lazy loading , which permits the base installation to be extremely lightweight. Instead of distributing neuroimaging data with the installation, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ allows the user to dynamically pull from the cloud-based storage only those resources they need, as they need them. After a resource has been requested once, it remains cached in the filesystem for future utilization. template_description.json metadata file is displayed at left (for the pediatric MNI template). In addition to general template metadata, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ datasets can contain cohort-level and resolution-level metadata, which are nested within the main metadata dictionary and apply only to subsets of images in the dataset.
We demonstrate benefits of centralising templates in general, and the validity of the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ framework in particular, via its integration into fMRIPrep 38 , a functional MRI preprocessing tool. This integration provides fMRIPrep users with flexibility to spatially normalize their data to any template available in the Aʀᴄʜɪᴠᴇ. This integration has also enabled the development of fMRIPrep adaptations, for instance to pediatric populations or rodent imaging, utilizing suitable templates from the archive. The uniform interface provided by the BIDS-like directory organisation and metadata enables straightforward integration of new templates into workflows equipped to use TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ templates.
TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ also makes use of standards of practice from the software engineering industry, leveraging continuous delivery (CD) and continuous integration (CI) tools to automate backup and synchronisation of data across projects in the templateflow organisation on GitHub. CI and CD keep the web-based archive browser up to date by automatically indexing data files.

Community-driven, peer-reviewed contribution process .
A centralised repository for neuroimaging templates should also address the needs of template creators, enabling peer-reviewed integration of new templates with minimal informatic overhead. Inspired by the Conda-forge community repository and the Journal of Open Source Software (JOSS), the GitHub-based TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ organisation is a site for dialogue between members of the neuroimaging community and TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ curators. GitHub issues offer any community member the ability to share their needs with developers and Aʀᴄʜɪᴠᴇ curators, for instance by identifying templates or workflow features for potential inclusion in the project. Pull requests provide a means for members of the community to directly contribute code or template resources to the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ.
This peer-reviewed contribution process is facilitated through the Python-based TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Mᴀɴᴀɢᴇʀ. The TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Mᴀɴᴀɢᴇʀ automates the work of synchronising data from a local directory to cloud storage in OSF. Furthermore, it creates a GitHub repository containing git-annex pointers that enable DᴀᴛᴀLᴀᴅ to download template data from cloud storage to any machine with a copy of the repository. Finally, it opens a new pull request to propose adding the newly contributed template repository into the main TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ (Figure 7).

Figure 7.
To contribute a new template to TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ, members of the community first organise template resources to conform to the BIDS-like TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ structure. Next, tfmgr synchronises the resources to OSF cloud storage and opens a new pull request proposing the addition of the new template. A subsequent peer-review process ensures that all data are conformant with the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ standard. Finally, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ curators merge the pull request, thereby adding the template into the archive.

Discussion
The use of templates in neuroimaging is ubiquitous, and the emerging challenges regarding template use accordingly merit immediate attention. In an early perspective, Van Essen identified a set of desiderata for brain templates 36 . Above and beyond anatomical fidelity, he called for connecting templates in a " federation of databases " with " powerful and flexible options for searching, selecting, and visualizing data ". Finally, he stressed the importance of resource accessibility. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ provides a clear foundation for a framework that satisfies all of the aforementioned desiderata. Furthermore, the Aʀᴄʜɪᴠᴇ supports unambiguous identification of resources, a programmatically extensible platform for interfacing with template data, and a starting point for future investigations of inter-template concordance and robustness.

TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ addresses practical issues biasing current template usage and creation.
When researchers develop a new brain template or atlas for public dissemination, there exists no standard channel or format for distributing their work. With no central repository or uniform organisational scheme, template creators are often tasked with the responsibilities of maintaining template resources and managing access on an ad hoc basis. The work resulting in a template is reviewed only at publication time; subsequent template updates can go unreviewed, and any academic consensus that emerges after publication might not be associated with the original template resources at all.
Conversely, users are confronted with a surfeit of available templates and atlases, many with unclear provenance, and with the attendant challenges of locating them and integrating them into workflows. As illustrated by our text mining analysis, software libraries can further bias template usage, and the lack of consistent nomenclature between packages (as is the case for MNI templates) introduces further ambiguity in scientific communication. Leading neuroimaging toolboxes, including FSL and SPM, are packaged with a limited set of default template resources. Such limitations lead researchers to decisions regarding the creation and use of templates determined more by convenience than scientific considerations.
Together, user-facing and developer-facing issues contribute to an environment of ambiguity. Even where best practices are known, they can be difficult to locate and follow. Awareness and discoverability of extant template resources are limited by access hurdles and the use of private, decentralised communication channels, thereby driving unnecessary proliferation of template options. Integrating a template into software requires a custom solution for every new template, increasing the burden on developers. By aggregating resources in a centralised and freely accessible archive, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ maximises exposure of the scientific community to new templates and facilitates the dissemination of new template-based knowledge.
An unambiguous, accessible, and curated template portfolio. Using a software package's default template without awareness of alternatives can yield suboptimal results with adverse effects on both reproducibility and reporting. To mitigate these difficulties, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ aggregates representative template data for populations and species of scientific interest in a centralised, community-run archive. Furthermore, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ's organisational scheme enables univocal identification of brain templates, resolving the ambiguity of references to "MNI" space and thereby facilitating experimental reproducibility. More broadly, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ defines a BIDS-like language for references to brain templates and atlases 46 , endowing researchers with a common vocabulary so that the results they report in future work can be traced unambiguously to specific resources and replicated precisely. The archive provides template data across a range of granularities-for instance, researchers who study developmental trajectories can use a generally representative youth template or a template specific to a narrower age range according to the objectives of their research.

An improved platform for vetting template-based knowledge and building consensus.
GitHub's pull request system and integrated peer review process provide a public forum for discussion and vetting of newly proposed templates and annotations. Each template can be treated as a versioned living document that is continually reviewed and republished as necessary. If a user questions the validity or currentness of archive resources, the platform provides an immediate channel for publicly raising and addressing concerns. The discussion and vetting process forms a record of outstanding issues and concerns with existing resources that can inform researchers about the strengths and limitations of resources available to them. Ultimately, the centralisation of resources and discussion can help create a scientific consensus. As new technology enables the refinement of template resources, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ's integration with GitHub's version control system provides a built-in way to update existing templates and track revisions.
An intuitive API minimises the window for human error. The TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ client implements an intuitive query system that enables programmatic retrieval of template resources from the archive. Using a command-line tool or the Python Client API, scientists can easily integrate TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ into their neuroimaging analysis workflows to automate access to and use of template resources. Automation of workflows further promotes reproducibility by removing potential points of inconsistent or erroneous usage. Additionally, the BIDS-like organisation scheme of TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ resources facilitates integration with BIDS apps. Within fMRIPrep , for instance, the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ API enables flexible and systematic normalisation of preprocessed images to any template space requested by the user.

Delineating "standardness" of templates and coordinate spaces.
A dispute over what qualifies as a "standard" space stems in part from the limitations of a "one-size-fits-all" approach to coordinate spaces, in part from differing study objectives, and in part from the limitations of knowledge about the organisation of the brain. In particular, the substantial variability between brains manifests not only in the morphological positions of brain features, but also in the failure of some landmarks to consistently manifest in all subjects. Functionally, deep phenotyping studies 60 have similarly revealed that the subnetwork structure of individual human brains exhibits features not present in population-level averages, up to and including entire large-scale functional networks 61 . Furthermore, the relationships between many brain structures and their functions either are imprecisely characterised 62 , defy intuitive ontologies 63,64 , or vary among individual subjects. Such considerations call into question the "standardness" of standard spaces derived from population averages.
These challenges are doubly amplified when researchers aim to identify inter-species homologies in the architecture of the nervous system, an endeavour that might ultimately require defining an abstract relational space that leverages spatial geometry 65,66 rather than one tethered to an explicit, population-average template. The scope of the present work does not immediately encompass such challenges, but we hope that the resource introduced here can provide a starting point for future work in these veins. As registration frameworks and standard space definitions expand beyond anatomical averages and incorporate information from additional modalities 67,68 , we intend TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ to grow to accommodate these new types of resources.

Ethical compliance
We complied with all relevant ethical regulations. This resource reused publicly available data derived from studies acquired at many different institutions. Protocols for all of the original studies were approved by the corresponding ethical boards.

Code & data availability statement
All the software components discussed in this paper are available under the Apache 2.0 license, accessible as repositories of https://github.com/templateflow .

Online Methods
MNI space text mining analysis. To investigate the use of the term "MNI" in the neuroimaging literature, we conducted an exploratory text mining analysis. For this, we used the Elsevier API to download the entire corpus of two leading journals of neuroimaging methodology, NeuroImage and NeuroImage:Clinical . In this way, we retrieved a total 16,812 full-text articles that were subsequently segmented into lists of sentences. A scan of these sentences revealed 14,870 sentences across 6,048 articles that contained the word "MNI". Sentences were cleaned (i.e., removing punctuation, single letters, accents, numbers) and tokenized into words, which were subsequently lemmatized (i.e., converted to base form) using the NLTK wordnet lemmatizer. From the lemmatized words, we filtered out stopwords (i.e., NLTK stopwords and a custom list) and included words with a frequency above 10 as part of our "dictionary"; this yielded a dictionary size of 2,340 words.
Next, we computed a sparse dictionary by article count matrix (i.e., 2,340 x 6,048), on which we performed topic modelling with latent Dirichlet allocation 43 (LDA; implementation from scikit-learn with the learning decay hyperparameter set to 0.7). The number of topics (k=12) was selected by identifying the LDA model yielding the lowest perplexity 43 . The 20 words from the dictionary that loaded the highest on the 12 topics were visualized using word clouds. In addition, for each article we identified the most dominant topic and plotted the distribution of topics across articles. Design and architecture. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ comprises 4 cardinal components: (i) a cloud-based archive, (ii) a Python client for programmatically querying the archive, (iii) automated systems for synchronising and updating archive data, and (iv) inter-template registration workflows. Here, we discuss the details of each component's implementation in turn, as well as the manner in which they interact with one another to form a cohesive whole. The TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ. The archive itself comprises directories of template data in cloud storage. The data are stored on Google Cloud using the Open Science Framework (OSF) and on Amazon's Simple Storage Service (S3). Prior to storage, all template data must be named and organised in directories conformant to a data structure inspired by and compatible with the Brain Imaging Data Structure (BIDS) standard 46 . The precise implementation of this data structure is a living document and is detailed on the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ homepage ( http://www.templateflow.org ). We detail several critical features here.
The archive is organised hierarchically, and descriptive metadata follow a principle of inheritance: any metadata that apply to a particular level of the archive also apply to all inferior levels. At the top level of the hierarchy are directories corresponding to each archived template. If applicable, within each template directory are directories corresponding to sub-cohort templates. Names of directories and resource files constitute a hierarchically ordered series of key-value pairs terminated by a suffix denoting the datatype. For instance, tpl-MNIPediatricAsym_cohort-3_res-high_T1w.nii.gz denotes a T1-weighted template image file for resolution " high " of cohort 3 in the MNIPediatricAsym template (where the definitions of each resolution and cohort are specified in the template metadata file, template_description.json ). The most common TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ datatypes are indexed in Table 1 of the main manuscript; an exhaustive list is available in the most current version of the BIDS standard ( https://bids.neuroimaging.io/ ).
Within each directory, template resources include image data, atlas and template metadata, transform files, licenses, and curation scripts. All image data are stored in gzipped NIfTI format and are conformed to RAS+ orientation (i.e., left-to-right, posterior-to-anterior, inferior-to-superior, with the affine qform and sform matrices corresponding to a cardinal basis scaled to the resolution of the image). Template metadata are stored in a JavaScript Object Notation (JSON) file called template_description.json ; an overview of metadata specifications is provided in Figure 5 of the main manuscript. In brief, template metadata files contain general template metadata (e.g., authors and curators, references), cohort-specific metadata (e.g., ages of subjects included in each cohort), and resolution-specific metadata (e.g., dimensions of images associated with each resolution). Atlas metadata are often stored in TSV format and specify the region name corresponding to each atlas label. Transform files are stored in HDF5 format and are generated as a diffeomorphic composition of ITK-formatted transforms mapping between each pair of templates.
The archive has a number of client-facing access points to facilitate browsing of resources. Key among these is the archive browser on the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ homepage, which indexes all archived resources and provides a means for researchers to take inventory of possible templates to use for their study. The Python client. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is distributed with a Python client that can submit queries to the archive and download any resources as they are requested by a user or program. Valid query options correspond approximately to BIDS key-value pairs and datatypes. A compendium of common query arguments is provided in Table 1 of the main manuscript, and comprehensive documentation is available on the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ homepage.
When a query is submitted to the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ client, the client begins by identifying any files in the archive that match the query. To do so, it uses PyBIDS 58 , which exploits the BIDS-like architecture of the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Aʀᴄʜɪᴠᴇ to efficiently scan all directories and filter any matching files. Next, the client assesses whether queried files exist as data in local storage. When a user locally installs TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ, the local installation initially contains only lightweight pointers to files in OSF cloud storage. These pointers are implemented using DᴀᴛᴀLᴀᴅ 47 , a data management tool that extends git and git-annex. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ uses DᴀᴛᴀLᴀᴅ principally to synchronise datasets across machines and to perform version control by tracking updates made to a dataset.
If the queried files are not yet synchronised locally (i.e., they exist only as pointers to their counterparts in the cloud), the client instructs DᴀᴛᴀLᴀᴅ to retrieve them from cloud storage. In the event that DᴀᴛᴀLᴀᴅ fails or returns an error, the client falls back on redundancy in storage and downloads the file directly from Amazon's S3. When the client is next queried for the same file, it will detect that the file has already been cached in the local filesystem. The use of resource pointers with the client thus enables lazy loading of template resources. Finally, the client confirms that the file has been downloaded successfully. If the client detects a successful download, it returns the result of the query; in the event that it detects a synchronisation failure, it displays a warning for each queried file that encountered a failure.
Continued functionality and operability of the client is ensured through an emphasis on maximising code coverage with unit tests. Updating the client requires successful completion of all unit tests, which are automatically executed by continuous integration (CI) and continuous delivery (CD) services connected to GitHub. Ancillary and managerial systems. TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ includes a number of additional systems and programs that serve to automate stages of the archive update process, for instance addition of a new template or revision of current template resources. To facilitate the update and extension process, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ uses GitHub actions to automatically synchronise dataset information so that all references remain up-to-date with the current dataset. These actions are triggered whenever a pull request to TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ is accepted. For example, GitHub actions are used to update the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ archive browser so that it displays all template resources as they are uploaded to the archive.
Whereas the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ client synchronises data from cloud storage to the local filesystem, a complementary TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ manager handles the automated synchronisation of data from the local filesystem to cloud storage. The Python-based manager is also used for template intake, i.e., to propose the addition of new templates to the archive. To propose adding a new template, a user first runs the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ manager using the tfmgr add <template_id> --osf-project <project_id> command.
The manager begins by using the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ client to query the archive and verify that the proposed template does not already exist. After verifying that the proposed template is new, the manager synchronises all specified template resources to OSF cloud storage. It then creates a fork of the tpl-intake branch of the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ GitHub repository and generates an intake file in Tom's Obvious Minimal Language (TOML) markup format; this intake file contains a reference to the OSF project where the manager has stored template resources. The TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ Mᴀɴᴀɢᴇʀ commits the TOML intake file to the fork and pushes to the user's GitHub account. Finally, it retrieves template metadata from template_description.json and uses the metadata to compose a pull request on the tpl-intake branch. This pull request provides a venue for discussion and vetting of the proposed addition of a new template. Inter-template registration workflow. To effect the flow of knowledge across template spaces, TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ includes a workflow for computing robust transformations between any pair of adult human template spaces. To compute a transformation between two template spaces, the inter-template registration workflow makes use of 10 of the high-quality T1-weighted adult human brain images used in the creation of the MNI 152 template portfolio. In the first step of the workflow, these 10 images are registered to both template spaces using the symmetric normalization (SyN) algorithm 25 . Next, a 10-channel registration is performed in ANTs using the SyN algorithm. Thus, the workflow computes a single transformation that simultaneously optimises the alignment between all 10 images in both coordinate spaces.

Data entity API query example Description
Template "MNI152Lin" The template dataset to which an image or other data file belongs.

Resolution res=1
The image resolution. Each resolution is assigned a key, which is defined in the res field of template_description.json .
Mask desc="brain", suffix="mask" Indicates that the image is a binary-valued annotation, where voxels labelled 1 are part of the mask.

Discrete segmentation
desc="malf", suffix="dseg" Indicates that the image is an integer-valued annotation. Each segmentation image file ( .nii.gz format) is paired with a dictionary of segment names ( .tsv format).

Probabilistic segmentation
label="CSF", suffix="probseg" Indicates that the image is a probabilistic annotation, wherein the value of each voxel indicates the probability of that voxel belonging to the specified label.
Transformation from="MNI152Lin", suffix="xfm" File containing a mapping between 2 stereotaxic coordinate spaces. The source space is defined in the from field, while the target space is defined in the tpl field.

Template cohort cohort=1
Subsample of a dataset used to generate an average template.

Argument Environment variable Specifications
template_id Identifier of the template. This is the value of the tpl field in all file names.

--osf_project OSF_PROJECT
The OSF project where the template data are to be stored. The project must be writable by the user account whose credentials are specified in the --osf-user and --osf-password arguments.
--osf_user OSF_USERNAME Account username or identifier for OSF cloud storage.
--osf_overwrite Flag that indicates that the OSF client should force the overwrite of any existing files in the OSF project that have names conflicting with those of new files.
--gh-user GITHUB_USER Account username for GitHub. The user account whose credentials are provided must have a fork of the TᴇᴍᴘʟᴀᴛᴇFʟᴏᴡ repo.

--path
Path to a local directory where template resources are located. The path must either be a directory whose name is tpl-<template_id> or contain such a directory.
--nprocs Maximum number of parallel processes to run when uploading to or fetching from OSF.