Genome Sequences of Brucella Melitensis Strains QH2019001 and QH2019005 Provides Insight Into Different Evolutionary Origins Within Qinghai Province, China

Blood samples were isolated from two brucellosis males that were conrmed B. melitensis positive following multiple loci variable-number tandem repeat analysis (MLVA) and strains QH2019001 and QH2019005 were identied. Genomic DNA was extracted from these samples and whole genome sequencing (WGS) was performed. Next, SNP-based phylogenetic analysis was performed to compare the two strains to 94 B. melitensis strains (complete genome and draft genome) retrieved from online databases.


Results
The QH2019001 and QH2019005 strains were found to have differences in their variable-number tandem repeats. Phylogenetic examination of the 96 strains assigned them to 5 genotype groups, with QH2019001 and QH2019005 assigned to the same group, but to different subgroups; with the QH2019005 strain assigned to a new subgenotype, IIj. These ndings were then combined to determine strain origin.

Conclusions
Utilizing a whole genome SNP-based approach highlights the diversity between B. melitensis strains, and it can facilitate the elucidation of different evolutionary histories. This approach also revealed that the QH2019005 is part of a new subgenotype with an ancient origin in the eastern Mediterranean Sea.

Background
Brucellosis, which is caused by bacteria in the Brucellagenus, is one of the most important zoonoses worldwide and is considered a "forgotten, neglected zoonosis" by the WHO (World Health Organization 2014) [1]. Brucellosisis endemic in regions within Africa, Asia,and Latin America and other countries along the Mediterranean Sea [2,3].Human infections can occur due to the consumption of contaminated non-pasteurized milk or cheese, or by occupational exposure to infected animals or their carcasses, uterine secretions, or aborted fetuses [4]. While the mortality rate of brucellosis is low, the morbidity rate is much higher. Worldwide, incidence of human brucellosis varies widely, from < 0.01 to > 200 per 100,000 people in endemic disease areas [5]. In mainland China, the total incidence rate of human brucellosis increased from 0.92per 100,000 people in 2004 to 4.2 per 100,000 people in 2014 [6]. Thus, brucellosis is becoming a major public health problem that impacts human physical and mental health.
To ensure accurate epidemiological surveillance and to enable differentiation between infected and vaccinated individuals, species identi cation and subtyping is essential [10].Currently,Brucella identi cation and subtyping is performed usingmultiple loci variable-number tandem repeat analysis (MLVA), a genotyping tool for species identi cation and genetic diversity assessment [11]. However, MLVA is time-consuming and laborious and cannot cover all of the currently known Brucella speciesand biovars. An alternative to the MLVA approach is to utilize whole genome sequencing (WGS)-based strain typing combined with single nucleotide polymorphism (SNP) analysis to obtain a higher resolution and ne-scale differentiation of Brucella genotypes and subgenotypes [12].
In humans, B. melitensis, which contains three biovars (1, 2 and 3), is the most virulent species and was the predominant strain associated with human brucellosis in China from 1953 to 2013 [13]. While all three biovars cause disease in small ruminants, their geographic distributions vary [14]. In a study examining Brucella in the Qinghai-Tibet Plateau region, none of the genotypes matched any of the sequences in the Brucella 2012 MLVA database, possibly due to these strains having unique geographical characteristics in the Qinghai-Tibet Plateau region and B. melitensis being the predominant species in the area [15].
Recently, the prevalence of human brucellosis in Qinghai Province has increased from 0.04 per 100,000 people in 2011 to 1.96 per 100,000 people in 2018. Cases are distributed across 31 counties within the entire province, thus making infectious source tracking di cult. Herein, two brucellosis positive individuals from the Qinghai-Tibet Plateau region in China were examined and B. melitensis strains QH2019001 and QH2019005 were identi ed using MLVA. These strains were then further examined using WGS and SNP-based phylogenetic analysis to examine associations with other B. melitensis strains and to determine origin.

Ethics statement
This research was conducted according to the principles of the Declaration of Helsinki, and the study protocol was approved by the Ethics Committees of the National Institute for Communicable Disease Control and Prevention (Beijing, China). Informed consent was obtained from the two patients who received a brucellosisdiagnosis at the Qinghai Institute for Endemic Disease Prevention and Control based on the Diagnosis for Brucellosis Standards (WS 269-2019) used in China. Obtained patient history/data were anonymized for the purpose of this study. Blood samples were obtained and Brucella strains were isolated.
Genomic DNA sample preparation Genomic DNA was extracted using a sodium dodecylsulfate (SDS)-based method [17]. The obtained DNA was assessed via agarose gel electrophoresis and quanti ed using a Qubit 2.0 Fluorometer (Thermo Scienti c, USA).

Whole genome sequencing
Sequencing libraries were generated using a NEBNext® Ultra™ DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's recommendations, with 1 µg DNA used per sample. Unique codes were added to attribute sequences to each sample. Brie y, DNA samples were fragmented by sonication to a size of 350 bp. The DNA fragments were then end-polished, A-tailed, and ligated with full-length adaptors for Illumina sequencing prior to PCR ampli cation. Finally, PCR products were puri ed using an AMPure XP system (Beckman Coulter, USA) and libraries were analyzed for size distribution using an Agilent 2100 Bioanalyzer and quanti ed using real-time PCR. WGS of the QH2019001 and QH2019005 strains was performed using an Illumina NovaSeq 6000 (PE 150 bp) at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China).

Genome assembly and annotation
The raw sequencing data were ltered to obtain clean data(the original data must be ltered to obtain valid data).Genome assembly was then performed using the clean data with multiple k-mers as follows:1) assembly with SOAPdenovo using multiple k-mers (default parameters of 95, 107, and 119), with the optimal k-mer obtained by adjusting other parameters (-d -u -R, and -F) and by selecting a preliminary assembly with the least scaffolds; 2) assembly with SPAdes using multiple k-mers (default was 99 and 127), with the assembly with the optimal k-mer and the least scaffolds selected; 3) assembly with Abyss using a k-mer of 64; 4) assembly results from the three software were integrated using CISA software and the assembly result with the least scaffolds was selected; 5) Gapcloser was used to ll gaps and lter reads with low sequencing depth (< 0.35 of the average depth) to obtain the nal assembly result; and 6) fragments below 500 bp were ltered out and the nal result was utilized for the gene prediction. A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed with a threshold of identity ≥ 40% and coverage ≥ 40%.The obtained genomic sequences were deposited in the Microbial Pathogens Database (http://data.mypathogen.org.; accession numbers: ICDC-20200117-135823-300 and ICDC-20200117-140311-357).

SNP discovery and phylogenetic construction
Genomic alignment between the sample genome and reference genome (or among more than two sample genomes) were performed using the MUMmer [18] and LASTZ [19][20] tools. Genomic synteny was analyzed based on the alignment results.
SNP-based phylogenetic analysis was then performed to infer the relationships between the B. melitensis strains. SNP was found by the genomic align ment results among samples by the MUMmer and LASTZ .
The phylogenetic trees were constructed using TreeBeST and PhyML with orthologous genes and bootstraps of 1,000.
For the two strains that were sequenced, rapid whole genome alignment was performed using MUMmer and LASTZ, with 94 strains of B. melitensis that are publicly available utilized as reference genomes. The obtained alignment data were then analyzed for genomic synteny.Gene family was constructed by the gene of the two strains and reference strains , using the multiple softwares: Blast [21] was used to pairwise align all genes and eliminate the redundancy by solar and carried out gene family clustering treatment based on the alignment results with Hcluster_sg software.

General conditions
The rst B. melitensis strain, QH2019001( isloated and assayed in 2018, sequenced in 2019 ) was isolated from a male worker in Qinghai province, China. This man went to his hometown (Anhui Province) in April 10, 2019 for 12 days where he had contact with sheep (unknown origin) when cleaning sheepfolds. The second B. melitensis strain, QH2019005, was isolated from a male farmer in Qinghai Province that made a living as a sheep herdsman in Yushu, Maduo, and Haiyan Counties in Qinghai Province. The two males showed clinic symptoms and had positive serum agglutination tests (SATs) at titers of 1:100. When examining epidemic factors, clinical manifestation and serum detection (Table 1), the two males were diagnosed as brucellosis positive and the blood cultures resulted in the isolation of Brucella strains QH2019001 andQH2019001 that were identi ed asB. melitensis biovar 3.

StrainIDs
Host The two strains showed a 99.1% genotype similarity. For the two B. melitensis strains, variablenumber tandem repeat (VNTR) differences were present in bruce08 and bruce42 (panel 1); while, bruce19, bruce21, bruce04, bruce09, and bruce16 (panels 2A and 2B) also exhibited VNTR differences.Furthermore, thetwo B. melitensis strains were also similar to the other strains in Qinghai Province.When combining these ndings with epidemical data, B. melitensis strain QH2019001 appears to have been imported from Anhui Province, while the QH2019005 strain appear to be native to Qinghai Province (Fig. 1).

Genomiccharacteristic
To determine the origins of the two strains, WGS was performed using a high performance sequencing platform, Ion Torrent PGM (Life Technologies, USA), and followed by de novo assembly. The two strains comprised 24 contigs, with N50s 297 and 202 bp (QH2019001) and 332 and 914 bp (QH2019005). The total contig sequence lengths were 3,291,786 bp for QH2019001 and 3,289,996 bp for QH2019005, with both having a GC content of 57.24%.
The obtained sequences were then compared to 94 B. melitensis strain sequences using TreeBeST and PhyML. Phylogenetic analysis based on all of the whole genome SNPs provided insight into the overall geographical distributions of the 96 examined B. melitensis strains. The strains were divided into ve major genotypes as follows: genotype I, Bruc048 and its related strains; genotype II, UK3/06 and its related strains; genotype III, strains of African origin; genotype IV,B115 and its related strains; andgenotype V, 16 M and its related strains (Fig. 2). The QH2019001 and QH2019005 strains were all associated with the genotype II group. Furthermore, several clades and subclades were isolated within the lineages and were associated with geographic attributes. The QH2019001 strain was assigned to subgenotype IIh, while the QH2019005 strain was assigned to subgenotype IIj (a new subgroup).

Discussion
Due to thehigh mutation rate in tandem DNA repeats loci, the MLVAplatform provides the ability to distinguish eld and vaccine strains [22], associate isolates with geographic origins, and discriminate most Brucella species with a high homogeneity discriminatory power [23]. In China, MLVA genotyping provides a basis for reliable disease monitoring when combined with strain phylogeny and regional and time distributions, thus potentially providing valuable insight when examining an epidemiological linkage or trace-back investigation during a brucellosis outbreak [13,24,25].Furthermore, previous studies have shown thatMLVA can be used as an epidemiological tool for strain resolution and to distinguish relapses from reinfections in brucellosis patients [26]. The two Brucella strains identi ed herein, which were isolated from the Qinghai province, had difference when compared to those in the Brucella 2012 MLVA database [1]. Furthermore, the QH2019001 and QH2019005 strains did exhibit VNTR difference, but most of the strains were similar to each other. However, due to the high genetic diversity of the Brucella genus, the MLVA approach can be limited based on the nature of VNTRs and the technique itself [27], thus the exact origins of the two strains could not be determined.
One study showed that when utilizing only a MLVA approach, only 52 genotypes from 63 patients can be distinguished; however, when utilizing a whole genome SNP-based approach, all patients can be distinguished from each other,thus demonstrating a higher discriminatory power [28]. Furthermore, WGSbased analysis has been shown to distinguish very closely related B. melitensis strains (up to 6 genes or 7 SNPs difference), while MLVA pro ling might not provide enough resolution to accurately predict phylogenetic relationships [29]. Thus, SNP data can enable the development of novel high-resolution molecular typing techniques that can discriminate inter-and intraspecies relationships in pathogenic microorganisms.
Following phylogenetic clustering, the B. melitensis strains (n = 96) were divided into ve genotypes [24],with genotype I forming the earliest diverging clade. Most of the Asian isolates of B. melitensis clustered into genotype II,while genotypeIII was associated with Africaand genotypes IV and V were associated withEurope and the Americas, respectively, which is consistent with previous ndings [30].