I4 launched from Kennedy Space Center’s Launch Complex 39A and traveled into Low-Earth Orbit across a three-day mission, reaching an orbital altitude of approximately 364 miles and ultimately splashed down into the Atlantic Ocean.
Dried Blood Spot (DBS) Pre-flight, In-flight, and Post-flight Sampling. Crew members warmed and massaged their fingertips to maximize blood flow. Fingertips were sterilized (BZK antiseptic towelette, Dynarex, Reorder No. 1303) and punctured using a contact-activated lancet (BD Biosciences, #366593) or a 21-gauge needle (BD Biosciences, #305167). Whatman 903 Protein Saver DBS cards (Cytiva, #10534612) were used to capture, transfer, and then store capillary blood with a desiccant pack (Cytiva, #10548239) at ambient temperature.
DNA Extraction for qPCR-based assessment of Telomere Length. Three 3 mm circular punches were cut from the Whatman 903 Protein Saver Cards (cat# WHA10534612) containing blood samples using an Integra Miltex Standard Biopsy Punch (cat# 12-460-406) and placed into a 1.5 mL microcentrifuge tube with sterile tweezers. Samples were prepared using the Qiagen QIAamp DNA Investigator Kit (cat# 56504) following the manufacturer’s isolation of total DNA from FTA and Guthrie Cards protocol. The quantification of DNA in each sample was determined through fluorometric quantification with the Qubit 4 Fluorometer (Thermo Fisher Scientific, cat# Q33238) and the 1X dsDNA HS Assay Kit (cat# Q33231). DNA samples were sent to Colorado State University for multiplex qPCR analysis.
Multiplex Quantitative PCR Telomere Length Measurement. MMqPCR measurements of telomere length were carried out by preparing 22 µL of master mix using SYBR green GoTaq qPCR master mix (Promega #A6001) combined with the telomere forward primer (TelG; 5’-ACACTAAGGTTTGGGTTTGGGTTTGGGTTTGGGTTAGTGT-3’), telomere reverse primer (TelC; 5’-TGTTAGGTATCCCTATCCCTATCCCTATCCCTATCCCT AACA-3’), albumin forward primer (AlbU; 5’-CGGCGGCGGGCGGCGCGGGCTGGGCGGA AATGCTGCACAGAATCCTTG-3’), albumin reverse primer (AlbD; 5’-GCCCGGCCCGCCG 4 CGCCCGTCCCGCCGGAAAAGCATGGTCGCCTGTT-3’) at 10 µM per primer (Integrated DNA Technologies), and RNase/DNase free water. 3 µL of DNA at 3.33 ng/uL was added for a final volume of 25 µL, final TelG/C primers concentration of 900 nM, and the AlbU/D primers at 400 nM. A Bio-Rad CFX-96 qPCR machine was used to measure telomere length. The cycle design was as follows: 95°C for 3 min; 94°C for 15 s, 49°C for 15 s, for 2 cycles; 94°C for 15 s, 62°C for 10 s, 74°C for 15 s, 84°C for 10 s, and 88°C for 15 s, for 32 cycles. The melting curve was established by a 72°C to 95°C ramp at 0.5°C/second increase with a 30 second hold. Standard curves were prepared using human genomic DNA (Promega Cat # G3041) with 3-fold dilutions ranging from 50 ng to 0.617 ng in 3 µL per dilution. Negative controls included a no-template TelG/C only and AlbU/D only, and a combined TelG/C and AlbU/D control. Samples were normalized across plates using a human genomic DNA standard. Each sample was run in triplicate on a 96-well plate format and relative telomere length was established using a telomere (T) to albumin (A) ratio.
Whole Genome Extraction and Sequencing
DNA were extracted from the cell pellets of spun down cfDNA blood collection tubes (Streck, #230470) using the QIAamp Blood Maxi Kit (Qiagen #51192), and then shipped to Element Biosciences for library preparation. The extracted DNA was quantified using Thermo Fisher Qubit dsDNA HS Assay Kit (cat# Q238253) and 8 samples were prepared using the KAPA HyperPrep Kit and KAPA Unique-Dual Indexed Adapter Kit (cat# 8861919702). The DNA libraries were quantified using Thermo Fisher Qubit dsDNA HS Assay Kit (cat# Q32854) and sized using Agilent High Sensitivity DNA Kit (cat# 5067 − 4626).
The 8 DNA libraries generated with the KAPA HyperPrep Kit were processed using Adept Library Compatibility Kit (Element Biosciences, Cat# 830-00003), individually circularized with 0.5pmol (30 ul of 16.67nM) input, and quantified using the kit-provided qPCR standard and primer mix. The libraries were pooled into 4 separate 2-plex pools, each denatured and sequenced on Element AVITI system (Element Biosciences, Part #88 − 00001) using 2x150 paired end reads with indexing. Primary analysis was performed onboard the AVITI sequencing instrument. FASTQ files were analyzed using a secondary analysis pipeline from Sentieon.
cfDNA Extraction and Sequencing
cfDNA was isolated from 500uL aliquots of plasma from cfDNA blood collection tubes (Streck, #230470). cfDNA was extracted from each crew member from all timepoints (4 crew members, 6 timepoints, 24 total extractions). cfDNA was extracted using Qiagen’s QIAamp ccf DNA/RNA Kit and eluted in 15 uL Qiagen Elution Buffer per sample. Yield was measured for each sample using Thermo Fisher Qubit 1X dsDNA HS Assay (cat# Q33231).
Entire extract volume was used as input for library preparation using NEBNext Ultra II DNA Library Preparation Kit for cfDNA protocol. Each sample was barcoded using NEBNext Multiplex Oligos for Illumina (Unique Dual Index UMI Adaptors − 96 reactions). Final library was eluted in 30uL and checked for concentration using Thermo Fisher Qubit 1X dsDNA HS Assay (cat# Q33231). Fragment sizes were determined using Agilent’s Tapestation 2100 and D1000 reagents and screentape, with resulting average fragment size ~ 380 bp. 0.25 pmol of each sample.
A total of 24 cfDNA libraries generated with the NEBNext Ultra II DNA Library Preparation kit were processed using Adept Library Compatibility Kit (Element Biosciences, Cat# 830-00003). Each library was circularized individually with an input range of 0.2–0.5 pmol (30 ul of 6.67-16.67nM) based on linear library yields. The final circularized libraries were quantified using qPCR standard and primer mix and pooled into 2 separate 4-plex pools. Each 4-plex pool was denatured and sequenced on Element AVITI system (Element Biosciences, Part #88 − 00001) using 2x147 paired reads with 19bp UMI/index 1 and 8 bp index 2. Primary analysis was performed onboard the AVITI sequencing instrument. FASTQ files were analyzed using a secondary analysis pipeline from Sentieon.
Clonal Hematopoiesis Targeted Variant Calling:
Genomic DNA was obtained from purified, granulocyte-depleted Peripheral Blood Mononuclear Cells using the same protocol as NASA Twins Study6. All samples from the testing and validation cohort were sequenced using a custom designed DNA sequencing assay (DB0188, VariantPlex, ArcherDX). This panel captures the nine genes most commonly mutated in solid tumor patients following therapeutic radiation including the full exonic regions of five genes (DNMT3A, TET2, ASXL1, TP53, CHEK2) and targeted exonic regions of four genes (JAK2, SRSF2, SF3B1, PPM1D). Libraries were prepared from 250 ng gDNA using the VariantPlex protocol (ArcherDx Inc., Boulder, CO, USA) which utilizes Anchored Multiplex PCR (AMP) technology to generate target-enriched sequencing-ready libraries. Following DNA fragmentation ligation with a universal ArcherDx molecular barcode (MBC) adapter is performed, which tags each DNA molecule with a unique molecular index (UMI) and allows for unidirectional amplification of the sample using gene-specific primers. The resulting libraries were sequenced using a NovaSeq 6000 instrument (Ilumina), as per manufacturer's instructions. UMI Consensus was built using Sentieon’s UMI extract tool, alignment to the GRCh 38 reference genome was performed with BWA MEM (v0.7.15). Sentieon TNscope RNA-seq variant pipeline (v202010) was used for variant calling, filtering of reads based on mapping quality, depth, and strand bias36. BCFtools(v1.9) was used to filter by triallelic sites, short tandem repeats, read quality and read position bias. Varient-Effect-Predictor VEP(v107) and SnpEff(v4.3) was used for annotation of variants and further filtering based on predicted impact of mutations. Data wrangling, tidying, and visualizations were performed using R (v4.1.2), RStudio( v2021.09.2) and libraries (Tidyverse, Dplyr, data.table, ggplot2).
Whole Genome / cfDNA Preprocessing
Blood and plasma samples were subjected to whole genome and cfDNA short read sequencing as detailed above. Resultant FASTQ files were validated using FastQC (v0.11.9) and MultiQC (v1.13). Read adapters were trimmed at 3’ and 5’ ends for low quality using Trim Galore (v0.6.5), lower quality reads were classified and removed, retaining only those reads with length > = 25bp, and ph read quality > = 20. Reads were aligned against the hg38 human reference genome with BWA MEM (v.0.7.15) and subjected to standard QC and deduplication procedures as a part of Sentieon’s TNscope (v202010) DNAseq workflow.
Whole Genome / cfDNA / Single cell Variant Calling
Aligned and preprocessed reads were subjected to the TNScope variant calling pipeline. Calls were filtered using Fisher’s exact test and subsetted to SNP variants using samtools (v1.16.1), and filtered by triallelic sites, short tandem repeats, read quality, and read position bias using BCFtools (v1.16). Varient-Effect-Predictor VEP (v107) was utilized for annotation of variants and further filtering based on predicted impact of mutations. Resulting coordinates were processed into allele and gene frequency matrices, and visualized in R using the tidyverse (v1.3.2) suite of packages.
cfDNA Fragment Analysis
Fragment size distribution was calculated using the bamPEFragmentSize tool from the deepTools Python package (v3.5.1). Levels of cfDNA (read counts) originating from different chromosomes were normalized by chromosome length and total number of reads in the library generating an Read per Killobase per Million reads (RPKM) measurement. The fraction of cell-free mtDNA relative to chromosomal cfDNA in plasma was compared and visualized in R using the tidyverse (v1.3.2) suite of packages.
cfDNA Tissue of Origin Deconvolution
The enrichment of cfDNA fragments from various tissues were calculated by read coverage depletion analysis at transcription starting sites (TSSs) to estimate nucleosome positioning and infer gene expression. The pipeline is described in detail in Bezdan et al. 20207. The resulting nucleosome periodicity was correlated with (1) per-tissue gene expression reference matrix retrieved from the Human Proteome Map (HPM; Kim et al. 201411) or (2) individual astronaut pseudo-bulk expression of different cell subpopulations extracted from PBMC scRNAseq dataset. In both cases the tissue/subpopulation-periodicity correlations were ranked by the value of Pearson's correlation coefficient, clustered (utilizing Ward method with Euclidean distances) and visualized in R using the tidyverse (v1.3.2) and ComplexHeatmap (v2.14.0) packages.
Longitudinal Gene Expression Analyses
Longitudinal single cell data was processed in R using the Seurat package (v4.3.0) to normalize, scale and cluster cell populations. Cell identities were determined through computational gating parameters of inclusion based on gene expression of key markers, similar to gating from Fluorescent-Activated Cell Sorting. CD8 + T cells were selected from the PBMC population by filtering CD3D + CD8A + positive cells. CD4 + T cells from CD3D + CD8A + cells, CD14 + monocytes from CCR2 + CD14 + cells, CD16 + monocytes from CD14 + CD16 + cells, NK cells from NCAM1 + CD3- and NCR3 + CD3- cells and DCs from CD86+, CD83 + cells. Seurat package (v4.3.0) was used for normalization, scaling, and differential gene expression analysis of sn-DNA data. Data wrangling, tidying, and visualizations were performed using R (v4.1.2), RStudio( v2021.09.2) and libraries (Tidyverse, Dplyr, data.table, ggplot2).