Combinatorial quantification of distinct neural projections from retrograde tracing

Comprehensive quantification of neuronal architectures underlying anatomical brain connectivity remains challenging. We introduce a method to identify the distinct axonal projection patterns from a source to a set of target regions and the count of neurons with each pattern. For a source region projecting to n targets, there are 2n — 1 theoretically possible projection types, although only a subset of these types typically exists. By injecting uniquely labeled retrograde tracers in k regions (k < n), one can experimentally count the cells expressing different combinations of colors in the source region1,2. Such an experiment can be performed for n choose k combinations. The counts of cells with different color combinations from all experiments provide constraints for a system of equations that include 2n — 1 unknown variables, each corresponding to the count of neurons for a projection pattern. Evolutionary algorithms prove to be effective at solving the resultant system of equations, thus allowing the determination of the counts of neurons with each of the possible projection patterns. Numerical analysis of simulated 4 choose 3 retrograde injection experiments using surrogate data demonstrates reliable and precise count estimates for all projection neuron types. We illustrate the experimental application of this framework by quantifying the projections of mouse primary motor cortex to four prominent targets: the primary and secondary somatosensory and motor cortices.

1 unknown variables, each corresponding to the count of neurons for a projection pattern. Evolutionary algorithms prove to be effective at solving the resultant system of equations, thus allowing the determination of the counts of neurons with each of the possible projection patterns. Numerical analysis of simulated 4 choose 3 retrograde injection experiments using surrogate data demonstrates 25 reliable and precise count estimates for all projection neuron types. We illustrate the experimental application of this framework by quantifying the projections of mouse primary motor cortex to four prominent targets: the primary and secondary somatosensory and motor cortices.

Introduction 30
The mouse brain contains over 70 million neurons. To chart how this overwhelmingly large number of neurons are interconnected is a core mission of the BRAIN Initiative to advance our understanding of the structural and functional organizational principles of the mammalian brain 3 . Along with rapid developments of genetic and viral sparse labeling, 3D microscopic imaging, and computational tools for single neuron reconstructions, tens of thousands of single neurons have been reconstructed with 35 detailed axonal trajectories [4][5][6] . Yet, a comprehensive whole brain wiring diagram at single neuron resolution remains a formidable challenge because of the sheer complexity of the brain and the laborious work of neuronal reconstruction methods 7 . Most anatomical regions in the mammalian brain project to multiple distinct locations 8 . A fundamental relation between macroscopic regional connectivity in the brain and microscopic cellular architecture is that if a source region projects to a 40 target region, there must be at least one neuron type with soma located in the source region whose axon extends to the target region 9 . Consider a source region projecting to target regions. Several kinds of axonal architecture can possibly serve as cellular substrates. For instance, there could be distinct groups of neurons, each extending their axons to a single target region. Alternatively, or in addition, some neurons might sprout their axonal branches into all target regions. Many additional groups of 45 neurons may exist, each reaching a distinct subset of the target regions. We maintain that sets of neurons with distinct projection patterns (e.g., a neuron projects to regions a, b, c, d; another neuron projects to regions b, c, and e) belong to different classes. Thus, identifying the projection pattern of each neuron constitutes a form of neuronal classification.
A source region projecting to target regions may potentially contain up to 2 − 1 projections 50 neuron types based on distinct patterns of axonal presence or absence in each target region. If we wish to find the numbers of neurons in each of the potential types (the projectomics census), we need 2 − 1 integers. A brute force approach to this challenge might entail labeling individual neurons to visualize their axonal projections [4][5][6]10,11 . However, this requires a representative sample for each class, which demands an impractical number of reconstructions while still missing the rarest classes. Moreover, in 55 such an approach, each source region requires its own set of experiments. In this paper, we introduce a practical and scalable strategy to estimate the number of neurons with each distinct projection pattern from multi-label retrograde tract tracing.

The concept behind combinatorial projectomics
Suppose we could inject in each of the target regions a uniquely labeled retrograde tracer, 60 corresponding to 'colors' 1 , 2 , … . When analyzing the somata in the source region, those co-labeled with all the 1 , 2 , … colors would correspond to the class of neurons projecting to all regions.
Those co-labeled only with colors , and , but none of the other colors, would correspond to the class projecting to regions , and and not to the other regions. Counting the cells with each combination of labels, possibly leveraging recent progress in automation and computer vision 12 , would solve the census 65 challenge. Importantly, this analysis can be carried out in parallel on all source regions projecting to the target regions.
The above-described thought experiment has two major limitations. First, the number of regions targeted by a typical source region in the mammalian brain is larger than the number of distinct retrograde tracers that can be practically injected in vivo. For example, individual source regions in the 70 mouse neocortex project from ~5 to more than 30 potential brain-wide targets 1,13 , whereas state of the art tract tracing is limited to triple or at most quadruple injections 14 . Second, retrograde tracer injections must be confined to a portion of the target region in order to minimize the risk of spilling into adjacent regions, which would contaminate the results 15 . Thus, in a subset of neurons of the source region that do project to the given target region, but not to the injected portion, the soma will be 75 unlabeled and thus missed in the cell counts.
Here we introduce an experimental and analytic design that overcome both of the above limitations. The basic idea is to perform multiple retrograde tracing experiments each covering a subset of the target regions. This is conceptually analogous to the shotgun sequencing strategy in genomics 16 .
Every experiment allows the determination of the number of somata in the source region that are co-80 labeled by any combination of retrograde tracers. The target regions not covered by a given experiment will contribute certain free variables to the count of neurons with each projection pattern, corresponding to the first limitation. Moreover, the exact proportion of neurons that project to the target regions covered by a given experiment, but not labeled due to the second limitation, will contribute additional free variables. Different experiments cover every target region several times in 85 different combinations, creating a many-to-many relation between the counts of co-labeled somata and the free variables across sets of multiple retrograde tracings. The key to the solution is to obtain several numerical constraints sufficient to estimate all free variables. In other words, enough experiments must be carried out so that the number of co-labeled somatic counts is sufficiently greater than the number of free variables that need to be found. 90 To explain this approach with an example, consider a source region projecting to four target regions. This scenario yields 15 (2 4 − 1) possible projection patterns and corresponding potential neuron types: 4 types projecting to just one of the targets, 6 types projecting to two targets, 4 types projecting to 3 targets, and 1 projecting to all 4 targets (Fig. 1). Now suppose we can only inject three retrograde labels, conveniently referred to as green, red, and blue. We then perform four experiments, 95 each leaving out one of the four target regions. The first experiment injects the green retrograde label in the first target region, the red one in the second, and the blue in the third, leaving out the fourth target region. From this experiment we can count the number of cells in the source region that are only labeled green, only red, or only blue; those that are co-labeled with each of the three two-color combinations, and those that co-labeled by all three colors, for a total of seven distinct numerical values. Which of the 6 15 neuron types contribute to the count of the somata that are only labeled green? All of those cells must project to the first target, but not all cells that project to that target will be colored green due to the second limitation. Moreover, the green-only cells also include the neurons projecting to both the first and the fourth target, since the latter was not injected. Lastly, we need to account for the cells projecting to both the first and second target which were not labeled red and similarly for all other 105 neuron types that projects to multiple targets as long as they include the first one.
To quantify these contributions, we adopt the following notation: let be the count of somata exclusively labeled green, 1 the number of neurons that project only to the first target, and 1 the proportion of neurons projecting to the first target that are labeled green (where 1 <1 due to the second limitation). Similarly, 2 and 2 are the number of neurons that project only to the second target 110 and the proportion of those neurons that are labeled red, respectively (and same for 3 , 3 etc.).
Furthermore, 12 represents the number of neurons that project just to the first and second target and, by extension, 1234 is the number of neurons that project to all four targets. We can then formulate the following equation: Additionally, we wish to note that with quadruple retrograde tracing we can in principle run ( 4 ) distinct experiments. Each experiment will give us the observed number of cells that project to at least 1 of four classes, 2 of four, 3 of four, or 4 of four (total 15 observations). Every retrograde injection also contributes an additional real value variable, namely the fraction of cells projecting to that region that 125 are in fact labeled. For example, if = 7, we have 127 potential neuron types (2 7 -1) and thus need to estimate 127 counts. There are 35 distinct experiments (( 7 4 )) and each will provide 15 observations and require estimation of 4 additional variables. The total number of equations is thus 35x15=525 and the number of unknowns is 127+35x4=267. Note that a ( 7 4 ) model presents a higher constraints-tounknowns ratio (525:267) than the simpler ( 4 3 ) model described earlier (28:27). 130 Although only positive values are acceptable for the model solution, the high degrees of nonlinearity in the system of equations present multiple positive solutions, when the system is not sufficiently constrained. In the next section (Section 3), we show that the solutions could become more reliable and robust to experimental error even in the simpler ( 4 3 ) model by repeating an experiment which increases the number of constraints by 7 while only adding 3 unknown variables. This is practically 135 significant, since this allowed us to reliably apply the ( 4 3 ) model to the triple-injection retrograde labelling data acquired from the motor and sensory cortices of the mouse brain, which we present in Section 4.

Reliable estimation of the population sizes of projection patterns using a ( ) model
In this section, we validate the solvability of the model presented in the previous section by simulating 140 retrograde labeling using surrogate counts for the population sizes of different projection patterns (Fig.   2). We also evaluate the extent to which repeated trials of combinatorial labeling increases the reliability in estimating the surrogate counts in a simulated 4-target and 3-injection configuration.
A "triple-injection" refers to parallel injections in 3 of the 4 targets selected. One repeated trial of an injection refers to a triple injection repeated for the same three targets in with an 145 assumed variability for the fractions of axons being labeled. Note that each triple-injection or its injections, and each of the models was solved using an evolutionary algorithm (EA) 17 . We ran multiple trials of the EA with different initial conditions to explore the parameter space and identify all possible solutions to the model (see Section 6). Convergence patterns of the EA populations showed that the 12triple injection model created an optimization landscape that is more convex than that of the 4-triple injection model (Fig. 3C), although the 12-triple injection model required more EA generations for the 160 population to converge (Fig. 3B) due to higher number of unknown variables that needed to be estimated. In other words, the 12-triple injection model resulted in narrower range of solutions, which were also more accurate than the 4-triple injection model (Fig. 3C&D). Furthermore, a systematic reduction of the error in the estimated counts (Fig. 3D) from 4-to 8-and 12-triple injections shows that enhancing the combinatorial model description to include repeated trials of experiments can make our 165 approach extremely reliable. Similar systematic reduction in the error was also observed when these analyses were performed for larger counts of projection patterns (see supplementary figure Fig. S1), although 20-triple injections were required to reliably estimate the counts totaling ~100,000. This suggests that the accuracy of estimations depends on the total count of all projection patterns, in addition to the number of injection experiments. It should be noted that the repeated injections must 170 be non-unique in the sense that they must introduce variability in the real values (i.e., fractions of axons labeled), so that they produce distinct constraints. However, such a variability is naturally expected in experimental settings. It is also interesting to observe that the second limitation described in Section 2 becomes a crucial advantage here.

A proof-of-concept experimental application to primary and secondary motor and sensory cortices 175
To test our analysis design experimentally, we selected the upper limb area of the primary motor cortex (MOp-ul) of the mouse brain as one source region to quantify its target-specific cortico-cortical projection neurons. Previous work 1,5,18   distinct triple injections, one repeated trial of one experiment and two repeated trials of another experiment. Additionally, three models were described using the segregated counts from MOp-ul layers 2/3, 5, and 6 to delineate layer-specific projection patterns from the source region. Combined and segregated constraints obtained from 7 injection experiments are provided in supplementary Table S1.
The unknown variables of the four models were independently estimated by the solver (see 195 Section 6 for computational details). Table 1  Finally, the solver was also run for a null model where the constraints represented random noise, which 210 provided a baseline for comparing the convergence against that of the actual experimental constraints.
The solver convergence patterns, and their robustness are given in Fig. 5. While the solutions to the model describing the surrogate counts converged to lower root-mean square deviation (RMSE) than the model that described actual experimental constraints for 7 triple-injections, they both outperformed the null model by a much more substantial margin (Fig. 5). 215

Discussion
Connectomics has risen to high prominence in neuroscience in the 17 years since the term was coined 21 .
It is now broadly recognized that regional connectivity underlies distributed brain function and single neuron axonal projections underlie regional connectivity 22 . Online 3D atlases of regional connectivity for the mouse brain 13,23 provide an instrumental high-level blueprint of the main communication pathways 220 in mammalian central nervous systems. However, they lack the resolution to identify individual neurons, arguably the key elements for computational function. At the opposite extreme, electron microscopy offers the ultimate opportunity to densely reconstruct every single synapse, but only for local networks of mammalian brains in the foreseeable future 24,25 . Long-range axons constitute the conceptual and physical nexus between brain-wide circuits and synaptic communication. Although single-neuron 225 projection axons can be reliably reconstructed throughout the mouse brain from light microscopy imaging 4,6,10,11 , scaling up the digital tracing process remains a formidable open problem 26 . At the same time, the typical divergence of regional connectivity in the mammalian brain poses a combinatorial challenge to the systematic characterization of the neuronal substrates. In this report, we introduced a possible solution based on quantitative analysis of multi-color retrograde injections. Our numerical 230 computations based on realistic surrogate data demonstrated the feasibility, scalability, precision, and robustness of this approach. Moreover, we offered initial empirical evidence of the applicability of the proposed methodology in the case of the mouse primary motor cortex efferent.
The experimental validation of this study is limited by the fact that it included data from only 7 injection experiments. Our analysis from Section 3 (see also Fig. 3) suggested at least 8 triple injection 235 experiments to achieve an average error of roughly 0.1 relative to the true counts. Thus, the high IQRs observed for some of the projection patterns given in Table 1 could be attributed to the sparseness of experimental data included in this study. Furthermore, while it is expected that the estimated total counts for each projection pattern in Table 1 would reflect the sum of their respective layer-specific counts estimated independently, there were differences between the combined and the sum of layer-240 segregated estimations. In addition to the sparsity of the experimental data, the noise introduced in segregating the layer-specific counts (see Supplementary Table S1) likely enhanced such differences. Therefore, the counts for projection patterns with high IQRs should be interpreted cautiously, and future studies with more injection experiments are required to fully validate the results presented in Table 1. However, our analyses of surrogate and real data collectively show that the population sizes of 245 various projection patterns between a source and four target regions can be reliably estimated given sufficient experimental constraints using the model presented in this study.
Our results are in fact consistent with those reported in the literature. Neuronal connectivity of the MOp has been studied extensively at macro-(regional specific) 1,5,[27][28][29] , meso-(cell type-specific) 5,23,30 and micro-scales (single neuron) 4-6 . At macroscale (regional specific), the MOp-ul shares extensive 250 reciprocal connections with multiple domains of the MOs, SSp and SSs 1 . At the meso-scale (cell type specific), cortico-cortical connections arise mostly from the intratelencephalic (IT) neurons across layer 2-6; while other two major classes of neuron types, pyramidal (PT) and cortico-thalamic (CT), generate much less collateral projections to other cortical areas 5,18,[27][28][29][30] . In a recent study combining viral sparse labeling, 3D microscopic imaging, and computational algorithms, detailed axonal projection trajectories 255 of ~300 individual neurons in the MOp were reconstructed 4-6 , in principle providing the initial core of a ground truth dataset for validating all possible axonal patterns shown in the current study (Table 1).
However, that cell type-specific single neuron reconstruction strategy relied on available Cre-driver mouse lines, as well as labor intensive and time-consuming 3D imaging and computational reconstruction procedures. Consequently, only a small fraction of MOp neurons was reconstructed, 260 providing insufficient information to generate a comprehensive landscape of individual neuronal projection motifs. This inadequacy highlights the difficulty to obtain sufficiently large sample sizes even with big data consortium efforts, underscoring the need for practical and scalable alternatives. In this perspective, our current combinatorial approach provides an important complementary approach for cataloging connectivity-based neuronal types in the mammalian brain-one major goal of contemporary 265 neuroscience research. Notably, the technique described in this report can be applied to increasingly large amounts of retrograde tract tracing data that are being systematically collected and deposited in the BICCN data portal and other open resources 1,7,21 .
Simulations from the current study show that the model becomes highly reliable for the ( Nevertheless, we have shown in this paper that our approach can, in principle, reliably and robustly quantify the cellular architectures of mammalian brain connectivity in a comprehensive manner. 280

Solving the systems of equations
The systems of equations were solved using evolutionary algorithms (EA). We employed ( + ) evolution strategies 17 without adaptive mutation to estimate the integer and the real-valued unknown variables in the combinatorial models. Briefly, offspring solutions are created from parents, and 285 selection pressure is applied to both parents and offspring solutions for survival into the next generation. A total of 50 trials of EAs were run, each with different initial conditions. The population sizes were set to = 50,000 and = 250,000 for all EA runs. An integer random-walk and a gaussian step mutations with mutation rates of 0.1 were applied for the integer and real-valued variables respectively. The total number of EA generations were set to 2500 for all analyses except for the model 290 describing a total surrogate count of ~100,000 (see supplementary Fig. S1), which used 10,000 EA generations. A Java-based evolutionary computing library (ECJ) 31 was utilized in this study. The ECJ configuration and the full set of EA parameters are described in the shared software (see code availability section) under EqnSolver/input/.params Surrogate counts for projection patterns were generated by first setting the counts of 7 295 randomly selected types to zero (since not all possible projection patterns are expected to be present between a source and a set of target regions) and then randomly generating counts for the remaining types. Median from the top 10 EA runs represented the adopted solution, and the error in the estimated counts (Ê) is defined as follows: and are the true sum and number of types respectively, and and are the true and estimated counts respectively of type .

Code availability
The software to reproduce the results included in this paper is available at https://github.com/sivaven/CombinatorialProjectomics.git Care and Use Committee.

Tracer injection experiments
The MCP uses a variety of combinations of anterograde and retrograde tracers to simultaneously visualize multiple anatomical pathways within the same Nissl-stained mouse brain. (pentobarbital) and trans-cardially perfused with 50ml of 0.9% saline solution followed by 50ml of 4% paraformaldehyde (PFA, pH 9.5). Following extraction, brain tissue was post-fixed in 4% PFA for 24-48hr at 4°C. Fixed brains were embedded in 3% Type I-B agarose (Sigma-Aldrich) and sliced into four series of 50μm thick coronal sections using a Compresstome (VF-700, Precisionary Instruments, Greenville, NC) and stored in cryopreservant at -20°C. All sections were stained with Neurotrace 435/455 (Thermo 340 Fisher Cat# N21479) for 2-3 hours to visualize cytoarchitecture. After that, sections were mounted onto glass slides and cover slipped using 65% glycerol.

Imaging and post-acquisition processing
The tissue sections were scanned on an Olympus VS120 slide scanning microscope with 10X objective. Each tracer was visualized using appropriate fluorescent filters and whole tissue section 345 images were stitched from tiled scanning into VSI image files. Raw images were corrected for left-right orientation and matched to the nearest Allen Reference Atlas 19,20 coronal levels. An informatics workflow was specifically designed to reliably warp, reconstruct, annotate, and analyze the labeled pathways in a high-throughput fashion through our in-house image processing software Connection Lens 2,32 . Threshold parameters were individually adjusted for each case and tracer, resulting in binary 350 image output files suitable for quantitative analysis. Adobe Photoshop was used to correct conspicuous artifacts in the threshold output files that would have spuriously affected the analysis. A separate copy of the atlas-registered TIFF image file was brightness/contrast adjusted to maximize labeling visibility and images were then converted to JPEG file format for online publication in the MCP iConnectome viewer (MouseConnectome.org). 355

Assessment of injection sites
All injection cases included in this work are, in our judgment, prototypical representatives of each brain area. We have previously demonstrated our targeting accuracy with respect to injection placement, our attention to injection location, and the fidelity of labeling patterns derived from injections to the same location (see Supplementary Methods in our previous reports 1,2 for details). 360

Data Analysis
For each brain, eight serial sections covering ARA 45 -59 of the MOp were used for quantification. Sections were cut at 50 microns; 200 microns were present between each serial section.
Retrogradely labeled neurons were revealed respectively by fluorescence of CTB conjugated with Alexa Fluor 488, 555, 647, and FG (in some cases, AAVretro-Cre also was used). NeuroTrace 435/455 (Blue 365 fluorescent Nissl stain; Invitrogen) revealed cytoarchitectonic background of each section to ensure accuracy of anatomical identification of those retrogradely labeled neurons. Sections were scanned on the Olympus VS-120 virtual slide microscope. Individual channels were exported to Photoshop and overlaid for manual annotation. Cells containing positive signal in each channel were annotated with a 10-pixel point. Distinct annotations with overlap > 80% were noted to be positive for each annotated tracer, and a combination point was made. Annotations were quantified using ImageJ. To avoid overcounting, each annotated cell was recorded only once in the datasheet, with a 3x tracer positive cell being absent from the six contributing 2x combinations and three contributing single tracer positives.
MOp layers were then parcellated into layers 1, 2/3, 5, and 6 based on their cytoarchitectural properties. Quantification of the annotated cells were then ascribed to each layer. 375