Phylogenetic affiliations and genomic characterization of novel bacterial species and their abundance in the International Space Station

Background With the advent of long-term human habitation in space and on the moon, understanding how the built environment microbiome of space habitats differs from Earth habits, and how microbes survive, proliferate and spread in space conditions, is coming more and more important. The Microbial Tracking mission series has been monitoring the microbiome of the International Space Station (ISS) for almost a decade. During this mission series, six unique strains of Gram-positive bacteria, including two spore-forming and three non-spore-forming species, were isolated from the environmental surfaces of the International Space Station (ISS). Results The analysis of their 16S rRNA gene sequences revealed <99% similarities with previously described bacterial species. To further explore their phylogenetic affiliation, whole genome sequencing (WGS) was undertaken. For all strains, the gyrB gene exhibited <93% similarity with closely related species, which proved effective in categorizing these ISS strains as novel species. Average ucleotide identity (ANI) and digital DNA-DNA hybridization (dDDH) values, when compared to any known bacterial species, were less than <94% and 50% respectively for all species described here. Traditional biochemical tests, fatty acid profiling, polar lipid, and cell wall composition analyses were performed to generate phenotypic characterization of these ISS strains. A study of the shotgun metagenomic reads from the ISS samples, from which the novel species were isolated, showed that only 0.1% of the total reads mapped to the novel species, supporting the idea that these novel species are rare in the ISS environments. In-depth annotation of the genomes unveiled a variety of genes linked to amino acid and derivative synthesis, carbohydrate metabolism, cofactors, vitamins, prosthetic groups, pigments, and protein metabolism. Further analysis of these ISS-isolated organisms revealed that, on average, they contain 46 genes associated with virulence, disease, and defense. The main predicted functions of these genes are: conferring resistance to antibiotics and toxic compounds, and enabling invasion and intracellular resistance. After conducting antiSMASH analysis, it was found that there are roughly 16 cluster types across the six strains, including β-lactone and type III polyketide synthase (T3PKS) clusters. Conclusions Based on these multi-faceted taxonomic methods, it was concluded that these six ISS strains represent five novel species, which we propose to name as follows: Arthrobacter burdickii IIF3SC-B10T (=NRRL B-65660T), Leifsonia virtsii, F6_8S_P_1AT (=NRRL B-65661T), Leifsonia williamsii, F6_8S_P_1BT (=NRRL B- 65662T and DSMZ 115932T), Paenibacillus vandeheii, F6_3S_P_1CT(=NRRL B-65663T and DSMZ 115940T), and Sporosarcina highlanderae F6_3S_P_2 T(=NRRL B-65664T and DSMZ 115943T). Identifying and characterizing the genomes and phenotypes of novel microbes found in space habitats, like those explored in this study, is integral for expanding our genomic databases of space-relevant microbes. This approach offers the only reliable method to determine species composition, track microbial dispersion, and anticipate potential threats to human health from monitoring microbes on the surfaces and equipment within space habitats. By unraveling these microbial mysteries, we take a crucial step towards ensuring the safety and success of future space missions.


Introduction
Multiple low-Earth orbit and lunar orbit space habitats are being planned by governmental and commercial entities as part of the newly revitalized space industry in the 2020s. Cleaning practices and microbial monitoring to ensure crew safety will likely be based on information garnered from studies of the In the ongoing Microbial Tracking (MT) investigation of the ISS [7,8], out of 510 genomes sequenced, 56 microbial species have been isolated multiple times, representing 27 microbial genera (19 bacteria and 8 fungi). Dominant microbial species include bacterial genera Staphylococcus, Pseudomonas, Bacillus, and Acinetobacter, yeast genus Rhodotorula, and fungal genera Penicillium, Aureobasidium, and Aspergillus [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. In this communication, six strains isolated from various ISS environmental surfaces belonging to ve novel bacterial species constituting four different genera (Arthrobacter, Leifsonia, Paenibacillus and Sporosarcina) are described. Since recent bacterial taxonomy heavily depends on genomic characterization, whole genome sequences (WGS) of these novel species were generated and compared with publicly available genomes of closely related species.
One of the objectives of this study was to establish these six strains as novel species, for which chemotaxonomic, phenotypic, physiological, and phylogenetic (using taxonomic marker genes) analyses were carried out. To discover the phylogenetic placement within their respective genera, we have employed multiple gene analyses and WGS-based phylogenies containing all shared single-copy core genes. The second objective was to quantify the abundance of these recently identi ed species in the metagenomes of ISS surfaces. Hence, an in-depth analysis was conducted on the metagenomes collected during seven different ight missions, consisting of 106 individual samples, to assess the presence and prevalence of the species. In addition, an attempt was made to generate metagenome assembled genomes (MAGs) of these novel species from the ISS metagenomes. Finally, antiSMASH analysis was performed to identify, annotate, and analyze secondary metabolite biosynthesis gene clusters (BGCs) in the genomes of these novel bacterial species. Specialized metabolites and natural products encoded by these novel bacterial BGCs might provide insight into how different bacterial species thrive and interact in ISS conditions.

Sample collection and isolation
During the MT-1 and MT-2 mission series, samples were collected from the same set of eight surfaces aboard the ISS using pre-packaged and pre-sterilized wipes [7,8]. Upon return to Earth, among other strains belonging to previously de ned species, six strains were isolated from both the Advanced Resistive Exercise Device (ARED) platform and crew quarters when grown on blood agar (37˚C for 24 hrs) or R2A (25˚C for 7 days) media which were unable to be assigned to a known species and were suspected to be novel. Preliminary 16S rRNA sequence analysis indicated that strains isolated during Flight 3 (n = 1) from the ARED platform (IIF3SC-B10) and Flight 6 (n = 5) from both the ARED platform (F6_3S_P_1B and F6_3S_P_1C) and crew quarters (F6_8S_P_1A, F6_8S_P_1B and F6_8S_P_1C) belonged to four different genera: Arthrobacter, Leifsonia, Paenibacillus, and Sporosarcina. However, further WGS analysis is required to identify the strains at the species level.

Light microscopy and SEM
A liquid culture of spore-forming strains was heat shocked (80°C for 10 minutes), then plated on TSA and grown at 30°C for 5 days to induce endospore formation. Endospore staining with malachite green and safranin was performed using the Schaeffer-Fulton method [26]. Light microscopy and phase contrast images were taken on an Olympus BX53 microscope with an Olympus DP25 camera using Olympus cellSens software.
All strains were grown out on TSA media at 24°C for 72 hours. An isolated colony was xed in a 4°C solution of 2.5% glutaraldehyde (Ted Pella Inc., Redding, CA, United States) in 0.1M Sodium Cacodylate (NaCaco) (Sigma Aldrich) for an hour. The suspended cells were collected using a vacuum pump and a 0.2 µm Isopore lter membrane (MilliporeSigma, Burlington, MA, United States), and then transferred into a 1.5 ml centrifuge tube. The sample was incubated in 0.1M NaCaco solution at 4°C for 10 minutes and then replaced with fresh 0.1M NaCaco solution; this washing step was repeated a total of 3 times. The sample then went through dehydration by a step series of increasing IPA and water solutions at 10 min intervals. The series was: 50%, 70%, 80%, 90%, 95% and 100%, with the nal 100% rinsing occurring 3 times. The sample was stored at 4°C in 100% IPA until it was then critically point dried in a Tousimis Automegasamdri 915B critical point dryer (Rockville, MD, United States). Samples were a xed with carbon tape to SEM stubs (Ted Pella Inc.) and then were coated in a ~ 12 nm thick carbon layer by a Leica EM ACE600 Carbon Evaporator (Deer eld, IL, United States). SEM images were collected on a FEI Quanta 200F microscope (Themo Fisher, Waltham, MA, United States) located at the California Institute of Technologies Kavli Nanoscience Institute.

Biochemical tests and phenotype characterization
Growth temperature and other phenotypic parameters of the tested strains were assessed as follows. Bacterial strains were inoculated in both solid (R2A plates) and liquid (trypticase soy broth (TSB), BD Diagnostics Cat # 257107) media in 15 ml loose-capped centrifuge tubes and grown at temperatures of 4, 15, 25, 30, 37, and 45°C. Growth on plates and in tubes was monitored daily for 7 days, and incubation was halted if growth was observed. Samples grown at 4°C and 15°C were incubated for an additional 4 weeks and 2 weeks, respectively, for nal growth assessment. Salt tolerance was determined by inoculating the strains onto R2A plates containing 0-5% added NaCl, as well as agar containing only peptone plus 0 or 1% NaCl and examining growth after 7 days of incubation at 30°C. Oxidase activity was determined by using OxiDrops™ Liquid Oxidase Reagent (Hardy Diagnostics) on solid culture. Catalase activity was determined by observing effervescence when bacterial colonies were mixed with hydrogen peroxide on a sterile glass slide. Finally, pH tolerance (4 to 10) was tested by adjusting the pH of TSB broth with biological buffers, as described in P Xu, W-J Li, S-K Tang, Y-Q Zhang, G-Z Chen, H-H Chen, L-H Xu and C-L Jiang [27].
Biochemical tests were performed using a Gram-positive identi cation card (Vitek 2 GP ID, bioMérieux) according to manufacturers' protocol. Brie y, freshly grown colonies were transferred aseptically into the saline (aqueous 0.45-0.50% NaCl, pH 4.5 to 7.0) tube to prepare homogenous suspension with a density equivalent to a McFarland No. 0.50 to 0.63 using a calibrated VITEK® 2 DensiCHEK™ Plus. The suspension tube and Vitek 2 GP ID card were placed in the cassette and incubated at 37°C. Data entry, cassette loading to instrument and retrieval of raw data were done according to the VITEK instrument user manual. Test results were recorded within 10 h of inoculation.
Phenotypic ngerprint was generated through GNIII MicroPlate according to BioLog's protocol. Brie y, freshly grown colonies were transferred aseptically into the inoculum solution A (Cat #. 72401; BioLog) to prepare homogenous suspension with a density equivalent to a McFarland No. 0.50. Inoculum was loaded onto BioLog GNIII MicroPlate (100 µl per well) and incubated at 37°C for 24 h. OmniLog values (A590-A750) were recorded after a minimum 10 h of incubation using MicroPlate reader (FLUO star Omega, BMG Labtech, Germany).

Chemotaxonomy
To analyze the fatty acid methyl esters (FAME), cells were grown on Tryptic Soy Agar/Sabouraud Dextrose Agar (SDA) at 30°C for 48 hours until they reached mid-exponential growth phase. The harvested cell biomass was subjected to saponi cation, methylation, and extraction [28] for fatty acid analysis, which was carried out using the Microbial Identi cation System (MIDI) [29] with the Aerobe (RTSBA6) database (Sherlock version 6.0) following the standard protocol [30]. A gas chromatograph (Agilent 7890A) with a ame ionization detector was used for FAME analysis, and identi cation and comparison of the results were made using the MIDI System.
To extract and analyze quinones, polar lipids and peptidoglycans, cells were cultivated on TSA/SDA for 3 days at 30°C. The polar lipids and quinones were extracted and analyzed by two-dimensional thin-layer chromatography (2D-TLC) [31]. To visualize the different classes of polar lipids, the developed TLC plates were treated with 10% (w/v) ethanolic phosphomolybdic acid for total lipids, 0.2% (w/v) ninhydrin in butanol for aminolipids (speci c for amino groups), Dittmer and Lester's Zinzadze reagent for phospholipids (speci c for phosphates), and alpha-naphthol for glycolipids (speci c for sugars). Peptidoglycans were extracted and analyzed for the diagnostic amino acids from whole cells [32].

DNA Extraction and whole genome sequencing
To extract genomic DNA from the novel species, the ZymoBIOMICS DNA MagBead kit was utilized, following the manufacturer's instructions. To prepare the library for WGS, an Illumina Nextera DNA Flex library preparation kit was used as described earlier [10]. Sequencing of prepared libraries was carried out on a NovaSeq 6000 S4 ow cell paired-end 2 x 150-bp platform, and the reads were quality ltered and trimmed using FastQC v.0.11.7 [33]. Adapter sequences were removed using fastp v0.20 [34]. Draft genomes were assembled using SPAdes v.3.11.1 [35] up to the scaffold level, and the assembly quality was evaluated using QUAST v.5.0.2 [36]. The default settings were employed for all steps except for fastp, which included 512 adapters screening.
In addition to Illumina WGS, a secondary round of sequencing was carried out for strains IIF3SC-B10 T and F6_8S_P_1B T using Oxford Nanopore sequencing.

ANI and dDDH analyses
To elucidate species a liation of the isolated genomes, we retrieved all validly described and representative genomes of four identi ed genera from the NCBI database using the command-line tool 'bit' [39]. We calculated the Average Nucleotide Identity (ANI) and digital DNA-DNA hybridization (dDDH) methods to perform pairwise nucleotide-level comparisons. For ANI calculations, we employed FastANI v.1.33, which is a rapid alignment-free computational method, with the novel species as a query against other genomes. To estimate the in-silico DNA-DNA hybridization, we used the recommended formula 2 of the Genome-to-Genome Distance Calculator (GGDC) v.3.0 with the BLAST + alignment tool [40]. To further evaluate the genetic relatedness of the genomes of the genera Sporosarcina, ANI based analysis was used. All available genomes on the NCBI GenBank database under the genera Sporosarcina (n = 93) were retrieved using ncbi-genome-download (v.0.3.1). An all-vs-all ANI analysis of the genomes was conducted and drawn using ANIclustermap (v.1.2.0).

Phylogenetic analysis
The 16S sequences of the novel species were extracted from their WGS. Phylogenetic trees were constructed for each genus by incorporating publicly available 16S sequences of all the species within the respective genus. In cases where only the WGS was publicly available, a blast wrapper script was employed to extract the 16S sequence. The trees were rooted using a related species within the same family. The DECIPHER package was used to align and trim the 16S sequences [41]. To build phylogenetic trees, the phangorn package was used [42] on maximum likelihood with AIC values and 1,000 bootstrap replicates [42]. The trees were visualized using Interactive Tree of Life (iTOL) [43].
We created a whole genome-based phylogenies to identify closely related species of the isolated genomes. We used GToTree v.1.7.07 [44], a Hidden Markov Model (HMM) based command-line tool which aligns identi ed single copy genes using Muscle v.3.8 and produces concatenated protein alignment. For the Arthrobacter and Leifsonia genomes, we used 138 single-copy gene (SCG) sets of Actinobacteria, while for the Paenibacillus and Sporosarcina genomes, we used 119 SCG sets of Firmicutes. IQTREE v.2.2.0.3 with ModelFinder-Plus was then used to construct the phylogenetic tree from the protein alignment with 1,000 ultrafast bootstrap replicates [45][46][47]. We further retrieved 4,552 complete, non-anomalous, representative genomes of bacteria from NCBI Reference Sequence (RefSeq) database and constructed a phylogenetic tree of life along with the novel species using 16 SCG-set as previously described [48] (data not shown). Genomes containing at least 40% of the total 16 SGC targets were placed in the phylogenetic tree. All the trees were further annotated and visualized using interactive Tree Of Life (iTOL) v.6.7 [49].
Mapping ISS metagenome sequence reads from ISS to isolated novel species To investigate the presence of the isolated novel species in environmental samples from the International Space Station (ISS), we retrieved paired-end metagenomics reads from two microbial tracking (MT) missions, MT-1 (n = 42) and MT-2 (n = 64), from the NCBI Short Read Archive with project accession PRJNA438545 [4] and PRJNA781277 [8], respectively. Quality ltering of the metagenomes was performed using Trimmomatic v.0.39 with a sliding window of 4 bases and an average quality per base cutoff of 20 [50]. We used MetaCompass v.2.0 [51] to perform reference-guided assembly of the aligned metagenome reads against isolated genomes of novel species. We quanti ed the number of reads that aligned to the isolated genomes and assessed the breadth of the coverage of the assembled reads in each sample. We further tried to bin the contigs using MetaBAT v.2.12.1, but were unable to resolve any MAGs.

Genome annotation and prediction of secondary-metabolite biosynthetic potential
The gene prediction and annotation of the novel genomes were performed using the Rapid Annotations based on Subsystem Technology (RAST) online server using RAST-tk annotation scheme [52]. The Resistance Gene Identi er (RGI) v6.0.1 web portal which utilizes the CARD v3.2.6 database was used to identify antibiotic resistance genes and markers in the novel species from ISS environments with only "Perfect" and "Strict" matches. Secondary metabolite biosynthetic gene clusters (BGCs) were identi ed within each novel genomes using antiSMASH v.7.0.0 with a "strict" detection system [53]. The identi ed BGCs were curated for functional annotation using MIBiG v.3.1 JSON le via an in-house Python script [54].

Results
The six strains isolated during this study belonged to ve novel species spanning four bacterial genera. Among the six strains, four were non-spore-forming members and two strains formed endospores. Based on 16S rRNA gene similarities, not all strains were identi ed into a novel bacterial species, but ANI/dDDH analyses allowed them to be placed into ve distinct bacterial species. They were: Arthrobacter burdickii IIF3SC-B10 T , Leifsonia virtsii F6_8S_P_1A T , Leifsonia williamsii F6_8S_P_1B T , Paenibacillus vandeheii F6_3S_P_1C T , and Sporosarcina highlanderae F6_3S_P_2 T . In addition, the WGS of S. thermotolerans CCUG 53480 T was generated and compared with the genome of S. highlanderae F6_3S_P_2 T to identify the variable, conserved, and distinctive genomic traits. Table 1 summarizes the assembly statistics for all six strains. The draft genomes of the novel species were constructed with high-quality sequences, with assembly quality ranging from as few as 1 contig to 49 scaffolds. The genome sizes of all strains were < 4.2Mb, except for P. vandeheii F6_3S_P_1C T , which had a genome length of ~ 7Mb. The non-spore-forming strains had high GC contents, ranging from 68-71.4%, whereas spore-forming strains, such as P.

Genome characteristics and relatedness indices
vandeheii F6_3S_P_1C T (46.1%) and S. highlanderae F6_3S_P_2 T (41.6%), had low GC content. The total number of predicted genes was 2,166 for S. highlanderae F6_3S_P_2 T and 4,861 for P. vandeheii F6_3S_P_1C T . However, the non-spore-forming bacterial species had ~ 3,444 to 3,850 coding regions (Table 1). In addition to the novel species, the complete genome of S. thermotolerans CCUG 53480 T was generated and compared to S. highlanderae F6_3S_P_2 T strain since both shared > 99% 16S rRNA gene sequence similarities. Table 1 Assembly statistics for novel bacterial species isolated from the ISS and for the type strain of Sporosarcina thermotolerans.

Species
Strain # NCBI Accession # Isolation location  Table 2 presents the similarities among closely related members of the novel species based on ANI, dDDH, and two marker genes (16S rRNA and gyrB). The 16S rRNA gene sequences of all ve novel species described in this study exhibited > 99% similarities to previously established species. However, the gyrB gene sequence similarities of the novel species with the closely related species ranged from 88.6-92.8% and could serve as a genetic marker to distinguish the novel species. Moreover, ANI indices (< 95%) and dDDH values (< 70%) fell below the threshold levels of bacterial species identity and con rmed that the examined ISS strains were novel species. In the 16S rDNA-based phylogenetic tree encompassing all Arthrobacter species (Fig. 1A), with Micrococcus antarcticus as the outgroup, strain IIF3SC-B10 T clustered together with o cially named species such as A. agilis, A. cheniae, A. bussei, and A. antioxidans (Fig. 1A). However, in the WGS-based tree (Fig. 1B), constructed using a concatenated alignment of gene clusters from 59 genomes containing 138 single-copy core genes common to all Actinobacteria, A. burdickii was found to be distinct from the A. ruber and A. cheniae clades. Instead, it exhibited closer similarity to the unrecognized species A. sedimenti (ANI 93.8%).
The 16S rRNA gene sequences of L. virtsii F6_8S_P_1A T , isolated from crew quarters in Flight 6, exhibited > 98.9% similarity to L. soli and L. shinshuensis, indicating that the 16S rRNA gene is not a suitable marker for distinguishing members of this genus. Upon comparing the gyrB gene sequences of the members of this phylogenetic clade, the similarity values were below 91.7%. However, the strain F6_8S_P_1A T exhibited low dDDH values (< 28.3%) and ANI indices (86.3%), providing further evidence that it belongs to a novel species. In the 16S rDNA-based phylogenetic tree of all Leifsonia species, with Rathayibacter tritici as the outgroup, strain F6_8S_P_1A T was placed within a clade that also included the validly described species L. aquatica, L. xyli, L. lichenia, L. shinshuensis, and L. soli ( Fig. 2A). However, in the WGS-based tree (Fig. 2B), which was constructed using a concatenated alignment of gene clusters of eight available genomes containing a total of 138 single-copy core genes common to all species in the Actinobacteria, L. virtsii was separated from all these Leifsonia species. The next closest member was L. soli (ANI 86.3%).
In addition to L. virtsii, two strains were identi ed as L. williamsii based on their gyrB sequence similarity (91.6%), ANI index (84.3%), and dDDH value (24.7%), which were below the species threshold level. Surprisingly, despite the high 16S rRNA gene sequence similarity between L. virtsii and L. williamsii (99.2%), the 16S rRNA gene tree ( Fig. 2A) placed them in different clades, supported by 88% bootstrap values. Notably, the L. williamsii strains were isolated from the same crew quarter location as L. virtsii, and they even originated from the same culture plate of R2A medium. Initially, there was a suspicion that they might be clones from the same colony, but further analysis using WGS and gyrB sequencing con rmed that they were distinct novel species. In contrast to the 16S rRNA gene phylogeny, the WGS-based tree (Fig. 2B) clearly differentiated L. williamsii from L. virtsii.
The 16S rRNA gene sequences of P. vandeheii F6_3S_P_1C T , isolated from the ARED's surface during Flight 6, exhibited 99.5% similarity to P. pabuli, indicating that the 16S rRNA gene is not a suitable marker for distinguishing members of this genus. Upon comparing the gyrB gene sequences, P. pabuli also exhibited 94.9% similarity with P. vandeheii F6_3S_P_1C T . Since it was established that ~ 95% gyrB as cut-off value for species delineation, WGS was performed which showed that ANI index was only 88.4%. Based on low ANI index and dDDH value (34.6%), P. vandeheii F6_3S_P_1C T is differentiated from P. pabuli and described as a novel species. The 16S rRNA gene-based phylogeny (Fig. 3A) showed that P. xylanivrans, P. taichungensis, and P. paubli formed a tight clade with > 99% similarities among them. However, in the WGS-based tree (Fig. 3B), which was constructed using a concatenated alignment of gene clusters from 244 genomes containing a total of 119 single-copy core genes common to all species in the Firmicutes, P. vandeheii was separated from all these Paenibacillus species and was found to be closer to the P. xylanivorans (ANI 92.8%).
The strain F6_3S_P_2 T , another spore-forming bacterium belonging to the genus Sporosarcina and isolated from the ARED surface during Flight 6, displayed a 99.8% similarity to S. thermotolerans based on the 16S rRNA gene. This nding highlights the di culty in classifying spore-forming microorganisms using the 16S rRNA gene marker. Hence, the WGS of S. thermotolerans CCUG 53480 T was needed to identify the phylogenetic position of S. highlanderae F6_3S_P_2 T .
Upon comparing the gyrB gene sequences, S. highlanderae F6_3S_P_2 T exhibited an 87.0% similarity with S. thermotolerans CCUG 53480 T . Furthermore, the ANI index between the genomes of F6_3S_P_2 T and S. thermotolerans CCUG 53480 T was only 85.3%. Considering the low ANI index and dDDH value (29.8%), S. highlanderae F6_3S_P_2 T can be identi ed as a novel species, distinct from S. thermotolerans. In the 16S rRNA gene-based phylogeny (Fig. 4A), S. thermotolerans, S. luteola, and S. saromensis formed a closely related clade with > 99% similarities among them. However, S. koreensis did not cluster within this clade, despite having a 16S rRNA gene similarity with S. highlanderae of > 98.8%. On the other hand, in the WGS-based tree (Fig. 4B), constructed using a concatenated alignment of gene clusters from 15 genomes containing 119 single-copy core genes common to all species in the Firmicutes, S. highlanderae was separated from all other Sporosarcina species. Instead, it exhibited closer similarity to S. thermotolerans (ANI 85.3%).

Phenotypic characterization
The cell size (Fig. 5), colony morphology, biochemical characteristics based on Vitek-2 (Supplemental Table S1) and BioLog GNIII (Supplemental Table S2), fatty acid pro les (Supplemental Table S3) and chemotaxonomic features (Supplemental Figure S1) of all ve novel species are presented. A. burdickii can be phenotypically differentiated from other closely related Arthrobacter species since maltose, trehalose, cellobiose, turanose, and acetoacetic acid were not utilized as sole carbon substrate ( Table 3). The Leifsonia species did not show any speci c phenotypic characteristics that could be used to differentiate them from other closely related Leifsonia species; hence, molecular phylogeny is essential (Table 4). P. vandeheii was able to grow at 8% NaCl which can be used as discriminative test. Oxidase test was also positive whereas P. tundrae, P. xylanexedens, and P. amylolitus were negative. In addition, P. vandeheii can be differentiated by the utilization of Tween 40, turanose, γ-hydroxybutyric acid, L-malic acid, and L-serine as sole carbon source from P. taichungensis and P. paubli which are negative (Table 5). S. highlanderae could grow at 4% NaCl only, but S. thermotolerans, S. luteola, and S. saromensis were able to withstand > 7.5-10% NaCl concentration (Table 6).     Urease

Chemotaxonomic characterization
The novel actinobacterial species, namely A. burdickii IIF3SC-B10 T , L. virtsii F6_8S_P_1A T and L. williamsii F6_8S_P_1B T , were found to contain diphosphatidylglycerol, phosphatidylglycerol and an unidenti ed glycolipid as their major polar lipids. Additionally, A. burdickii IIF3SC-B10 T was found to possess a signi cant amount of an unidenti ed phospholipid (PL1), a characteristic that sets it apart from Leifsonia species. P. vandeheii F6_3S_P_1C T exhibited a complex polar lipid pro le, which included phosphatidylglycerol, diphosphatidylglycerol, phosphatidylethanolamine, phosphatidylserine, two unidenti ed phospholipids and an unidenti ed aminophospholipid. On the other hand, S. highlanderae F6_3S_P_2 T was found to contain phosphatidylglycerol, diphosphatidylglycerol, phosphatidylethanolamine two unidenti ed phospholipids and an identi ed lipid. The polar lipid pro les of new species are in excellent agreement with the data published earlier for Arthrobacter [55], Leifsonia [56], Paenibacillus [57,58], and Sporosarcina [59].
Based on this polyphasic taxonomy, the ve novel species are described, and their detailed phenotypic, FAME pro le, chemotaxonomic, and molecular characteristics are given below.
GC content is 46.1%. The type strain, F6_3S_P_1C T (= NRRL B-65663 T = DSMZ 115940 T ) was isolated from the ARED platform aboard the ISS, in 2018; its genome size is ~ 7.04Mb and available on NCBI, accession number JAROCD000000000.
Cells are Gram-positive, strictly aerobic, motile rods (0.3-0.4 µm in width and 3.3-3.7 µm in length). Spherical endospores are formed in a terminal position. Colonies grown on TSA are circular, convex, beige, and 4 mm in diameter on TSA medium after 5 days at 25˚C. Optimal temperature for growth is 30˚C; growth not at < 10˚C or > 37˚C; pH tolerance is 6.1-9.3. NaCl is not required for growth but can be tolerated up to 4% (w/v). Positive for catalase and oxidase activities.

Abundance of novel species in ISS metagenomes
We conducted an analysis of metagenomic reads obtained from two microbial tracking (MT) missions, encompassing seven ights across eight locations on the ISS, with the objective of identifying novel microbial species and potentially retrieving metagenome-assembled genomes (MAGs). To assess the presence of viable and intact cells of the novel species, we utilized propidium monoazide (PMA) treatment on the samples as previously described [7,8]. Our ndings revealed that the majority of the metagenomes had less than 0.1% of their total reads mapped to the novel species. Among all the species analyzed, P.
vandeheii F6_3S_P_1C T exhibited the highest mapping, with a maximum of 1.05% of total reads from a sample collected during Flight 2 near the port crew quarters (location 8) during MT-1. Therefore, we can conclude that none of these novel species are dominant in the ISS. Considering the limited proportion of reads mapped to the novel species, we proceeded to perform read assembly to explore the breadth of coverage against the isolated genomes. Interestingly, reads mapped to L. virtsii F6_8S_P_1A T and L. williamsii F6_8S_P_1B T from Flight 6 at location 8 exhibited a breadth of coverage of 59.4% and 80.7% respectively, despite representing a small fraction of the total reads (Fig. 6). However, apart from these cases, the overall breadth of coverage for the genomes was quite low, averaging at 0.21%, hence no MAG was generated. We also analyzed the PMA untreated samples and observed the similar pattern in the distribution of the breadth of coverages. Additionally, we noticed 84.4% breadth of coverage for L. williamsii F6_8S_P_1B T during Flight 7 at location 8 from where the strain was isolated. Despite the presence of a high breadth of coverage for Leifsonia genomes across multiple samples, we were unable to obtain MAGs due to the failure of achieving the minimum coverage depth threshold of 4X.

Functional characterization of the novel species
To investigate the genetic characteristics of the six novel strains, we performed a comprehensive genome annotation using RAST-tk ( Table 7)  Response regulator of zinc sigma-54-dependent twocomponent system + Further analysis of these organisms from the ISS revealed that, on average, they possess 46 genes related to virulence, disease, and defense. Two mechanisms were predicted as the primarily function of these genes: resistance to antibiotics and toxic compounds, and invasion and intracellular resistance.
A. burdickii IIF3SC-B10 T , in particular, is the only species that harbors the sarcosine oxidase (EC 1.5.3.1) gene, which is involved in the osmotic stress response. On the other hand, P. vandeheii F6_3S_P_1C T and S. highlanderae F6_3S_P_2 T , both belonging to the phylum Firmicutes, possess a speci c mechanism to respond to bacitracin-induced stress through the bceABRS four-component system. Notably, these two species also possess additional stress response mechanisms for periplasmic stress via the intramembrane protease RasP/YluC. While exploring other factors, it was observed that A. burdickii IIF3SC-B10 T does not possess any genes associated with motility. However, the other novel species have genes related to motility, including agellar biosynthesis proteins. Among all the species, only P. vandeheii F6_3S_P_1C T has the chemotaxis subsystem.
We conducted further investigations into the metabolic potential of these novel species and made some noteworthy observations. A. burdickii IIF3SC-B10 T was found to possess a signi cantly higher number of genes related to aromatic compound metabolism, while having fewer genes associated with iron metabolism compared to the other species characterized in this study. With the exception of P. vandeheii F6_3S_P_1C T , all the other species exhibited the capability to perform polyhydroxybutyrate metabolism. However, only P. vandeheii F6_3S_P_1C T , due to the presence of gamma-glutamyl transpeptidase (EC 2.3.2.2), can utilize glutathione as a sulfur source.
Considering the concerns regarding spore-forming bacteria and their resistance to sterilization processes in the ISS, we further explored the spore-forming capabilities of these novel species. As expected, signi cant number of genes associated with dormancy and sporulation were predicted in P. vandeheii F6_3S_P_1C T and S. highlanderae F6_3S_P_2 T genomes. Among them, P. vandeheii F6_3S_P_1C T exhibited the highest number of 40 genes, primarily associated with spore germination and spore maturation processes. In contrast, the other species are non-spore forming and do not possess speci c proteins associated with sporulation.
Antimicrobial resistance properties of the novel species In the isolated ve novel species, we searched for antibiotic resistance genes against the CARD database [60] and calculated the percentage identity with the reference sequences. Overall, these genomes showed potential resistance to seven drug classes, including rifamycin and tetracycline antibiotics. Interestingly, we found that all of these genomes contained genes from vancomycin resistance gene clusters with an identity ranging from 30.7-51.5%. Among other identi ed resistance genes, we discovered the presence of rifampicin monooxygenase (RIFMO) in Leifsonia species with a 63% match, which catalyzes the inactivation of the antibiotic rifampicin.
In P. vandeheii F6_3S_P_1C T , we identi ed a set of markers, including Llm 23S ribosomal RNA methyltransferase (LlmA_23S_CLI) and chloramphenicol acetyltransferase (CAT), which exhibited high sequence identities of 84.67% and 87.91%, respectively. LlmA_23S_CLI was originally detected in Paenibacillus sp. LC231, a strain isolated from Lechuguilla Cave, NM, USA [61]. Additionally, strain F6_3S_P_1C T was found to possess two genes, qacG and qacJ, which are part of a small multidrug resistance e ux pump conferring resistance to quaternary ammonium compounds (QACs).
Furthermore, in P. vandeheii F6_3S_P_1C T and S. highlanderae F6_3S_P_2 T , we identi ed the presence of tetracycline-resistant ribosomal protection genes tetB(P) and tet(Q), respectively, with approximately 30% similarity. These provide resistance by preventing the binding of the antibiotic tetracycline to the bacterial ribosome. Moreover, these genomes also encode orthologues of the antibiotic-inactivating enzyme fosfomycin thiol transferase. The genomic mining predicted presence of AMR gene and con rmation of the phenotypic resistance requires further investigation.

Production of secondary metabolites
To explore the potential for producing secondary metabolites in the newly discovered species, we utilized antiSMASH, a bioinformatics tool for predicting putative biosynthetic gene clusters (BGCs). This analysis revealed a total of 16 cluster types, including betalactone and type III polyketide synthase (T3PKS) clusters (Table 8). In A. burdickii IIF3SC-B10 T , we identi ed a moderately matched thiopeptide antibiotic called TP-1161, known for its effectiveness against multidrug-resistant gram-positive bacteria and fungi [62]. Furthermore, in the isolated Leifsonia species, we found two well-known gene clusters: T3PKSalkylresorcinol and NAPAA (non-alpha poly-amino acids) ε-Poly-L-lysine (ε-PL), both with a 100% match. Among the analyzed species, Leifsonia species shared a partially matched carotenoid biosynthetic gene cluster (BGC) with P. vandeheii F6_3S_P_1C T . However, several unique cluster types, including cycliclactone-autoinducer, lanthipeptide, lassopeptide, NRP-metallophore, opine-like-metallophore, and proteusin, were identi ed exclusively in P. vandeheii F6_3S_P_1C T . Notably, within the F6_3S_P_1C T strain, we also identi ed BGCs paeninodin (60%) and bacillopaline (100%). Lastly, S. highlanderae F6_3S_P_2 T exhibited one phosphonate, one type III polyketide synthase (T3PKS), and one terpene BGC, although these clusters have not yet been fully characterized. Each box depicts the percentage of similarity with the reported biosynthetic gene cluster. Unknown indicates a BGC was identi ed, but a percentage similarity was not calculated, since no known BGC was found to compare. Empty cells indicate that the BGC was not predicted in that genome.

Discussion
New launch technology and new investment in human exploration of space by governments and private industry are leading to a revitalization of the idea of long-term space habitation. Missions to the moon are already underway, and missions to Mars are planned for the near future. Such missions will be measured in multiple years rather than in months and will have no or little resupply from Earth. In such cases, the microbiome of the space vessel or habitat will need to be monitored for multiple reasons: the spread of pathogens through the air or on surfaces which could infect humans [63] or plants [64] as well as the spread of antimicrobial resistance genes [65], the health of human commensal microbiomes (and potential overgrowth of secondary pathogens), and the potential for biofouling of uid lines or water supplies via microbial overgrowth [66]. Also, with no resupply from Earth there is no ability to gain access to Earth's massive microbial biodiversity. Unless speci c microbes are stocked as supplies before launch [1], the microbes found on the spacecraft or habitat are the only ones which could be used for the many commercial purposes that microbes are used for on Earth: antibiotic or therapeutic discovery, manufacturing of drugs, food, and vitamins, plant growth enhancement, probiotics, etc. Biological in-situ resource utilization may also require bioremediation or bioconversion of raw, potentially toxic materials collected from moons, other planets, or asteroids/comets.
Whether or not the microbiome of a space habitat can be controlled and repurposed to this extent depends on a number of factors, including 1) whether the microbial diversity of such a space habitat would be su cient to include all the traits desired for the many purposes listed above, 2) accurate detection and identi cation of already-known microbes and taxonomic placement of unidenti ed microbes, including whether shotgun metagenomic sequencing would detect the presence of problematic microbes from low biomass surfaces, 3) characterization of potential phenotypic traits based on genomic predictions. Like the proverbial mustard seed, perhaps we inadvertently carry a planet's worth of microbial diversity wherever we travel. The novel microbes described herein are not necessarily any more noteworthy than those which might be isolated from an o ce building on Earth [though they are likely far more resilient given the harsh conditions [67] of the space environment], and yet each hosts signi cant potential for affecting human health [5] or for use in assisting plant growth [68], bioremediation or manufacturing, and offers a glimpse into the genetic and metabolic potential of the microbial diversity of the ISS [69].
The most abundant cultivable microbes on ISS surfaces include common, well-studied human commensals such as Staphylococcus, Rhodotorula, Penicillium, and Micrococcus species. However, there are many more that have only been isolated once aboard the ISS and which are at very low abundancepotentially shed from individual astronauts, from experiments such as plant grow-ins, from new pieces of cargo, or from the vast microbial diversity of the huma gut -which can be considered a part of the rare microbiome. Although individually rare, members of this community collectively play signi cant roles in ecosystem functioning and stability, including functional redundancy which enhances the resilience and stability of ecosystems by ensuring that multiple microbial species can perform essential ecological functions, such as nutrient cycling, decomposition, and symbiotic interactions. The novel bacterial species described during this study belong to rare microbial species since their incidence in the shotgun metagenomes was very low, and only the two Leifsonia species had su ently high breadth of coverage of mapped metagenomic reads from the ISS crew quarters to have been de nitively identi ed using shotgun metagenomic sequencing without culturing as well.
Previous to using a WGS approach, the diversity of the ISS cultivable microbiome was signi cantly underestimated due to reliance on only 16S-based taxonomic identi cation. However, for many bacterial genera, 16S rRNA gene sequencing strategy fails to differentiate new species with signi cantly different phenotypic traits. For instance, there is 99.8% similarity between the 16S rRNA genes of S. highlanderae F6_3S_P_2 T and S. thermotolerans CCUG 53480 T , with a mere 3 base pair substitutions. Without access to the whole genome, S. highlanderae would be categorized as S. thermotolerans, even though it is not a thermophile. Upon accessing all 93 Sporosarcina genomes from the NCBI database and generating an ANI heatmap (Supplemental Figure S2), it was evident that this clade contains at least ve novel genera and 56 species which are yet to be described. This inference was based on ANI values of less than 70% for 14 Sporosarcina genomes, encompassing S. highlanderae, S. thermotolerans, and S. luteola and further emphasizes that the 16S rRNA gene on its own is not a reliable tool for differentiating among members of the Sporosarcina genus. Placing these 93 genomes into their phylogenetic a liation require more study.
Upon examining 337 ISS bacterial genomes (plus six novel strains) belonging to 36 bacterial species (plus ve novel species), it was observed that nondominant, rare, and phylogenetically undescribed species predicted to produce natural products. As their genetic and phenotypic potential remains uncharacterized, exploration of the rare microbiome can lead to the discovery of novel bioactive compounds, enzymes, and metabolic pathways. Many of these rare microorganisms have untapped biotechnological potential, with applications in elds such as medicine, agriculture, industry, and environmental remediation. Studying the rare microbiome can uncover valuable resources for the development of new biotechnological tools and processes.
In the case of A. burdickii IIF3SC-B10 T , we identi ed a moderately matched thiopeptide antibiotic called TP-1161, known for its e cacy against multidrugresistant gram-positive bacteria and fungi [62]. Leifsonia species contain alkylresorcinol, which exhibits various activities including anticancer, antiin ammatory, antimicrobial, antioxidant, and genotoxicity effects [70]. Additionally, alkylresorcinol plays a role in bacterial cyst formation during unfavorable environmental conditions [71]. On the other hand, ε-PL is responsible for antimicrobial activity against food spoilage and food-poisoning bacteria [72].
In addition, A. burdickii and P. vandeheii also harbor metal-dependent β-lactamase superfamily-I and III proteins, which are known for their involvement in the hydrolysis of β-lactam antibiotics [73]. This enzyme family plays a signi cant role in conferring resistance to β-lactam antibiotics, including penicillins and cephalosporins. Furthermore, multidrug resistance e ux pumps such as the acri avine resistance protein and Multidrug And Toxic Compound Extrusion (MATE) family of Multidrug Resistance (MDR) e ux pumps were found to be present in Leifsonia species and spore-formers. These e ux pumps contribute to bacterial resistance by actively pumping out a wide range of antimicrobial compounds from the cell, including antibiotics and toxic compounds, thereby reducing their intracellular concentrations and promoting bacterial survival[Ref?
]. The acri avin resistance protein (AcrA) is a crucial component of the AcrAB-TolC e ux pump, which confers resistance to acri avin and other antimicrobial compounds. Its role in antibiotic resistance, multidrug e ux, intracellular homeostasis, bio lm formation, and potentially bacterial virulence underscores its signi cance in bacterial survival and adaptation. Understanding AcrA's function may aid in the development of strategies to combat antibiotic resistance and improve therapeutic approaches against multidrug-resistant bacterial infections. Given its involvement in antibiotic resistance and multidrug e ux, AcrA has emerged as a potential target for the development of novel antimicrobial agents [74]. By inhibiting the function of AcrA or other components of the AcrAB-TolC e ux pump, it may be possible to overcome bacterial resistance and enhance the effectiveness of existing antibiotics.
Identi cation of fosfomycin resistance protein (fosB) in both Paenibacillus and Sporosarcina genomes in this study is crucial for effective infection control measures and the development of strategies to combat the spread of antibiotic resistance. It has been reported that fosB is signi cant due to its impact on the treatment of bacterial infections, the emergence of multidrug resistance, the potential for horizontal gene transfer, and its implications for public health [75].
Streptothricin acetyltransferase was present in both Paenibacillus and Sporosarcina genomes and it was reported that streptothricin is a valuable antibiotic with broad-spectrum activity against microorganisms and can help reduce crop losses and increase agricultural productivity [76]. Its signi cance extends beyond its direct antimicrobial properties, nding applications in research, agriculture, and drug development. Understanding streptothricin's mode of action and resistance mechanisms contributes to our knowledge of antibiotics and aids in the development of novel strategies to combat bacterial infections.
Streptothricin has shown e cacy in agricultural practices, particularly in plant and fungal disease management [76].
The ribosome protection-type tetracycline resistance-related proteins, group 2, are crucial determinants of resistance to tetracycline antibiotics in both Gramnegative and positive microbes [77]. Their ability to protect ribosomes from the inhibitory effects of tetracycline enables microbial survival and growth in the presence of the antibiotic. The resence of these proteins in spore-forming novel species during this study and not in non-spore-forming bacteria needs further research.
Choloylglycine hydrolase plays a critical role in bile acid metabolism, enterohepatic circulation, and the regulation of the bile acid pool [78] and predicted only in the S. highlanderae genome. Its activity in uences the composition and function of the gut microbiota and has implications for host health and disease [79]. Understanding the signi cance of this enzyme provides insights into bile acid metabolism disorders [80] and potential therapeutic approaches for related conditions.
Multiple genes for bioremediation of toxic material, enhanced plant growth, and survival in extreme conditions are predicted in the genomes of these novel Genome mining resulted in identifying key functional genes of the novel species described in this study are listed in Table 7. Among all the novel species, The magnesium and cobalt e ux protein (CorC) plays a signi cant role in maintaining metal ion homeostasis, protecting against metal toxicity, facilitating adaptation to metal-rich environments, and contributing to bacterial stress response. Its activity is important for cellular functions and can also impact antibiotic resistance. The identi cation of CorC in only three Leifsonia genomes during this study, while not observed in other novel species, holds signi cant potential for enhancing our understanding of the mechanisms employed by actinobacterial group to regulate metal ions and adapt to diverse environmental conditions.
In summary, the rare microbiome is instrumental in maintaining ecosystem stability, adapting to environmental changes, facilitating ecological interactions, spurring biotechnological innovation, and bolstering conservation efforts. Investigations into, and preservation of, the rare microbiome enhance our understanding of microbial diversity and ecosystem dynamics, thereby contributing to the sustainable management of the ecosystems. Conservation strategies should consider the preservation of rare microorganisms, as their loss could precipitate cascading effects on ecosystem functioning and resilience. Our study of novel microbes and predicted bioactive compounds contributes to our understanding of the microbial ecosystem on the International Space Station (ISS) and lays the groundwork for further investigation into the potential implications of these novel species for the health and well-being of the ISS crew, as well as future space missions. The presence of speci c genes and proteins in these novel species underscores their adaptive capabilities and potential resistance mechanisms against a variety of environmental challenges, including exposure to antibiotics. A deeper understanding of the genetic composition and functional capabilities of these novel species provides valuable insights into their survival strategies and could contribute to the development of improved antimicrobial therapies and strategies to combat antibiotic resistance.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Since no human or animal subjects used, there are no ethics approval process needed.  Table 1 and the genome versions described in this paper are the rst versions.

CONSENT FOR PUBLICATION
Phylogenetic tree of Arthrobacter species including strain IIF3SC-B10 based on a. 16S genes and b. 138 single-copy core genes of phylum Actinobacteria, keeping Kocuria rhizophila as an outgroup.

Figure 2
Phylogenetic tree ofLeifsonia species including strains F6_8S_P_1A, F6_8S_P_1B and F6_8S_P_1C based on a. 16S genes and b. 138 single-copy core genes of phylum Actinobacteria, keeping Nocardia uminea as an outgroup.

Figure 3
Phylogenetic tree ofPaenibacillus species including strain F6_3S_P_1C based on a. 16S genes and b. 119 single-copy core genes of phylum Firmicutes, keeping Bacillus subtilis as an outgroup.

Figure 4
Phylogenetic tree ofSporosarcina species including strain F6_3S_P_2 and Sporosarcina thermotolerans CCUG 53480 based on a. 16S genes and b. 119 singlecopy core genes of phylum Firmicutes, keeping Paenibacillus polymyxa as an outgroup.