Statistical meta-analysis of microarray data
Gene expression datasets were selected based on a literature search for the terms “radiation”, “biodosimetry” and “mouse blood” on the NCBI gene expression omnibus website (https://www.ncbi.nlm.nih.gov/geo/) and are listed in Table 1 with the GEO accession numbers for uploaded datasets and publication PMIDs. We started by pooling the mouse microarray data sets using normalized and averaged signal intensities, dropping missing values wherever possible. Doses were as indicated in Table 1 (details in Supplementary data table 1). For some datasets only one non-zero irradiation dose was available, and in other cases a dose range. After merging all datasets, we retained n = 90 of 0 Gy samples; n = 14 of 1 Gy samples; n= 50 of 2 Gy samples; n = 59 of 3 Gy samples; n = 11 of 4 Gy samples; n = 60 of 6 Gy samples; n = 15 of 7 Gy samples; n= 6 of 8 Gy samples and n = 48 of 10 Gy samples.
Using the compiled data sets, we wrote customized code in the R programming language to look for genes positively or negatively correlated with radiation dose (based on Spearman’s correlation coefficients). We focused on genes that were stable over time (1-7 days, wherever the data was available) after irradiation by calculating the correlation coefficients with dose for all time points combined for each gene. We retained for further analysis only those genes whose correlation coefficients with dose were statistically significant (p-values <0.05 after Bonferroni correction).
Since the retained genes were often strongly correlated with each other and tended to have similar dose response shapes, we did not treat each gene as a separate predictor of dose, but instead sought to combine gene signals to reduce the data dimensionality and increase the robustness of resulting dose reconstructions. We selected the top 20 genes with positive correlation with dose, and the top 10 genes negatively correlated with dose and calculated median signals in each of these two groups. The difference between group medians was treated as the net signal (N), which was then used for dose reconstruction. Sensitivity analysis showed that choosing somewhat different numbers of genes in each group (e.g., 10 and 10, 30 and 30) did not substantially change the correlation of the net signal with dose.
From these calculations we obtained a list of dose-responsive genes (up and down) that were relatively stable over time (Supplementary data table 1 lists all the genes with annotations). A subset of these genes was then selected based on signal intensity range/relative copy number, for qRT-PCR analyses using an independently irradiated cohort of adult mice. For dose reconstruction, we fitted the following simple function by robust regression (using the nlrob function in the R programming language), where D is dose (Gy), N is the net gene signal (difference between median signals of the gene groups with positive and negative correlations with radiation dose, which varied from 0 to 10 Gy), T is time (days) after irradiation (which varied from 0.25 to 7 days), and k1-k3 are model parameters (see Figure 3 legend for parameter values used):

The structure of equation (1) was not mechanistically motivated, but empirically established based on examination of the data patterns. We evaluated alternative structures, where T was raised to the first or second powers instead of a power of ½, and where the T terms acted multiplicatively rather than additively. We also evaluated fitting these models by an ordinary least squares’ procedure (nls function in R) instead of by the robust procedure. This empirical exploration suggested that the structure of equation (1) fitted by a robust algorithm generated the best performance metrics during random training/testing splits of the data, which are discussed in the Results section.
Animal irradiations and sampling
All animal husbandry and experimental procedures were conducted in accordance with applicable federal and state guidelines and approved by the Animal Care and Use Committees of Columbia University (Assurance Number: A3007-01). C57BL/6NCrl mice were purchased from Charles River Laboratories (Wilmington, MA). For validation of gene expression using the real-time qPCR method, we irradiated 5 male and 5 female young adult (~12 week) C57BL/6 mice to 0, 1, 2, 3, 4, 6 and 8 Gy of x rays using an XRAD-320 Biological Irradiator (Precision X-ray, North Branford, CT) at a dose rate of 1 Gy/min at the Center for Radiological Research. The irradiator is equipped with a custom-made Thoraeus filter (1.25 mm Sn, 0.25 mm Cu, 1.5 mm Al, HVL 4mm Cu) and dose rate from the X-Rad-360 is calibrated periodically using a factory-calibrated Accudose 10x6-6 Ionization Chamber. 24 h after irradiation, we collected blood using cardiac puncture following euthanasia with CO2 and split the blood for blood counts and RNA isolation. The majority volume of blood was added to 4X volume of PAX solution (Becton Dickinson, NJ) and inverted 10X before freezing at ‑80°C. After overnight freezing, the samples were thawed at room temperature and left for >2 h before processing for total RNA isolation using the PAXgene® Blood RNA kit (Qiagen, Valencia, CA) according to the manufacturer’s protocol. Isolated RNA was quantified using the Nanodrop One spectrophotometer (Thermofisher) and A260/280 ratios recorded (average yields of RNA and metrics are shown in Supplementary data table 2).
A 20 mL aliquot of whole blood was processed for immunophenotyping using a CytoFLEX flow cytometer (Beckman Coulter Inc., Brea, CA). Cell counts were quantified by standard flow cytometry methods, using antibodies specific to mouse blood cell surface antigens as follows: Neutrophils (Biolegend, catalog# 127627, Brilliant Violet 421™ anti-mouse Ly-6G), WBC (Biolegend, catalog#103115, APC/Cy7 anti-mouse CD45), B cells (Biolegend catalog#115508, PE anti-mouse CD19) and T cells (Biolegend catalog#100312 APC anti-mouse CD3ε). Flow cytometry data were analyzed with CytExpert 2.3 (Beckman Coulter Inc., Brea, CA).
qRT-PCR analysis of dose reconstruction genes in mouse blood
We prepared complimentary DNA (cDNA) from 1 microgram total mRNA using the High-Capacity® cDNA Kit (Life Technologies, Foster City, CA). Quantitative real-time RT-PCR (qRT-PCR) was performed for the selected genes using Taqman® assays (Life Technologies) using the most 3’ assays as follows: Ccng1 (cyclin G, Mm00438084_m1 ), Aen (apoptosis enhancing nuclease1, Mm00471554_m1 ), Sgta (small glutamine-rich tetratricopeptide repeat (TPR)-containing, alpha, Mm00458535_m1 ), Grn (granulin, Mm00433848_m1), Ccnd1 (cyclin D1, Mm00432359_m1), Phlda3 (pleckstrin homology-like domain, family A, member 3, Mm00449846_m1), Rhoc (ras homolog gene family, member C, Mm00455906_m1), Xdh (xanthine dehydrogenase, Mm00442110_m1), Bax (BCL2-associated X protein, Mm00432051_m1), Cd5l (CD5 antigen-like, Mm00437567_m1), Tcn2 (transcobalamin 2, Mm00443660_m1) and Lrg1 (leucine-rich alpha-2-glycoprotein 1, Mm01278767_m1) in the up regulated gene group. Cd19 (CD19 antigen, Mm00515420_m1), Cxcr5 (chemokine (C-X-C motif receptor 5, Mm00432086_m1), Ly6D (lymphocyte antigen 6 complex, locus D, Mm00521959_m1) and Ccr7 (chemokine (C-C motif) receptor 7, Mm00432608_m1) were in the down regulated group. These genes were selected based on range of signal intensity/relative transcript copy number from the arrays. Geometric mean of Actb (actin B) and Gapdh (GAPDH glyceraldehyde-3-phosphate dehydrogenase) were used for all analyses. All PCRs were performed in duplicate using 15 ng cDNA as input for standard PCR conditions. Relative fold-induction was calculated by the 2−ΔΔCT method using Expression Suite software (Thermofisher) and Microsoft Excel software 56.
We calculated the net signal (N; difference between average signals for genes with positive and negative correlations with dose) for the selected qPCR genes and relative quantification was calculated based on delta Ct measurements (using geomean of two stable housekeeping genes, Actb and Gapdh). In the N calculations we used geometric mean instead of median for the up and down-regulated gene groups because this marginally improved the correlation with dose. We then performed a multiple linear regression to reconstruct dose using the following predictor variables: N, N2, Sex (female=0, male=1), Sex × N, and Sex × N2.
GO and pathway analyses
We used PANTHER database gene expression tools to analyze the gene lists generated by our meta-analysis of the microarray datasets 57. Top biological processes and functions enriched among these genes were identified as those with Bonferroni corrected p-value <0.05. We also uploaded the lists of mouse dose response genes to Ingenuity Pathway Analysis® Software (IPA from Ingenuity®: http://www.ingenuity.com) and performed core analysis for top pathways and biological functions. We also compared gene lists using the Venny tool 58.