M. pneumoniae strains
This study comprised M. pneumoniae strains detected from children with pneumonia at two hospitals during two consecutive outbreaks of M. pneumoniae pneumonia in South Korea in 2010–2012 and 2014–2016. Specimens were obtained from Seoul National University Children’s Hospital (Seoul) and Seoul National University Bundang Hospital (Seongnam). Epidemic periods and the diagnosis of M. pneumoniae pneumonia were defined as previously described (17).
Culture of M. pneumoniae was performed using pleuropneumonia-like organism broth as previously described (4). Reference strain M129 (ATCC 29342) was used as a positive control for culture.
DNA was extracted directly from cultivated M. pneumoniae using an extraction kit (DNeasy Kit; QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. The p1 gene was amplified by PCR for the confirmation of M. pneumoniae.
MLST analysis and P1 typing
MLST was performed on the M. pneumoniae DNA samples as previously described (10). P1 subtypes and each subtype variants were determined by sequencing the RepMP2/3 and RepMP4 genes and in comparison with previously published data (29, 30).
Selection of strains for Whole-genome analysis
A total of 30 strains were selected for the whole-genome sequencing (WGS) investigation. Thirty-seven M. pneumoniae strains were isolated during the 2010-2012 epidemic. P1 subtype 1 accounted for 71.9% and ST3 was responsible for 62.2%. The remaining 37.8% consisted of ST1, ST14, ST17, and ST33. In contrast, among the 45 isolates detected during the 2014-2016 epidemic, P1 subtype 1 accounted for 50.0% and the ST distribution was 88.9% for ST3 and 11.1% for ST14. In order to include as many different STs as possible, all strains that showed STs other than ST3 (ST1, ST14, ST17, and ST33) were included for WGS analysis. We have randomly selected 20 ST3 strains from each epidemic.
Next-generation sequencing (NGS)
The library for whole genome sequencing was prepared using Truseq Nano DNA Lib Prep Kit (Illumina, San Diego, CA, USA) and sequenced using MiSeq Reagent Kit V2 (Illumina, San Diego, CA, USA) on the Illumina MiSeq desktop sequencer (Illumina, San Diego, CA, USA). Illumina NGS workflows include four basic steps: library preparation, cluster amplification, sequencing and alignment. The NGS library is prepared by fragmenting a genomic DNA sample and ligating specialized adapters to both fragment ends. The library is loaded into a flow cell, and the fragments are hybridized to the flow cell surface. Each bound fragment is clonally amplified through bridge amplification. Sequencing repeats, including fluorescently labeled nucleotides, are added, and the first base is incorporated. The flow cell is imaged, and the emission from each cluster is recorded. The emission wavelength and intensity are used to identify the base. This cycle is repeated ‘n’ times to create a read length of ‘n’ bases. In this study, paired-end 250-bp reads were used with an average depth (coverage) of 442.93 (ranging from 172.95 to 795.39). The average number of reads during the sequencing was 1,445,719 (ranging from 564,516 to 2,596,168). Instead of directly aligning the reads to a reference sequence, de novo assembly was performed.
Genome assembly and annotation
NGS reads were assembled de novo using SPAdes (31). The number of contigs generated ranged from 3 to 8 per strain. These contigs were mapped to the M129 reference genome using the BLAST-like alignment tool (BLAT) and visualized using Integrative Genomics Viewer (IGV) (32-34). This mapping was used to develop PCR primers to join the contigs. High fidelity PCRs and Sanger sequencing were performed using standard methods. Overlapping and joining of the contigs were performed manually with Sequencher version 5.4.6 (Gene Codes Corporation, Ann Arbor, MI, USA). The initial NGS reads were aligned to the de novo assembled genome for the correction of errors. The corrected and completed circular genomes were annotated using Rapid Annotation using Subsystem Technology (RAST) (35).
Completed genomes were aligned using BRIG for the overall sequence similarity between the strains (22). MAUVE was used to detect large chromosomal rearrangements, deletions, and duplications (23). In the phylogenetic analysis with the 48 global strains downloaded from the National Center for Biotechnology Information (NCBI) were included. MAFFT was applied using the ‘FFT-NS-2’ method for multiple sequence alignment of the strains from the current study and with the global strains. Phylogenetic tree was constructed using the maximum composite likelihood approach based on neighbor-joining algorithms and visualized using Phylo.io (strains from the current study) and MEGA X (with the global strains) (36, 37). For the phylogenetic tree with the global strains, 500 iterations of bootstrapping analysis were used to generate confidence values. eBURST version 3 software (http://eburst.mlst.net/) was used to estimate the relationships among the strains and to assign strains to a clonal complex (CC) (38).
Single nucleotide polymorphism (SNP) and insertion/deletion (indel) analysis
To call SNPs and indels, completed genomes were first broken into 10-kb “reads” at 1-kb intervals and then aligned to the M129 reference strain (NCBI Accession Number NC_000912) using BWA v0.7.7 (39). Variant calling was performed using Samtools (40). The effects of the SNPs and indels in the resulting VCF files were evaluated and annotated using SnpEff v3.3 (41).
Proteins and functional analysis
For the analysis of proteins and functional annotation, PATRIC was used, and a heatmap was generated based on annotations (42). Gene translation, multiple sequence alignment and visualization of proteins were performed using Clustal Omega (43). Annotation of any hypothetical genes was performed using a BLAST search against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (44, 45).
Six reference genomes were included in each analysis as appropriate (Table 3). M. pneumoniae M129, FH, 309, KCH-402 and K405 are representatives of each P1 type and subtype. M. pneumoniae S355 is included, as this strain is one of the earliest strains that was fully sequenced and expressed macrolide resistance. Two FH strains were downloaded from NCBI, and the genome sequenced with Illumina was used as the reference genome.