Next Generation Sequencing-Based Mitochondrial Genome Analysis In Relation To The Drug- Induced Toxicity In Korean Subjects

Background: Mitochondrial variants have been investigated to be associated with many diseases, which was reported largely from European populations. Few studies, however, have annotated the whole mitochondrial DNA (mtDNA) genome associated with drug responses including adverse drug reactions (ADRs), especially in Asian populations. This study was performed to characterize mtDNA genetic proles, especially the distribution and frequency of well-known genetic biomarkers associated with diseases and drug-induced toxicity, in the Korean population by using high throughput next-generation sequencing. Results: A total of 681 variants was identied among all 118 Korean subjects. The MT-TRNP and displacement loop (D-loop) showed the highest numbers of variants (113 and 74 variants, respectively). In the D-loop, 25.4% of the subjects were identied to have m.16189T>C allele, which was known to reduce the mtDNA copy number variation in human cells. The variant m.1095T>C, annotated to aminoglycoside-induced ototoxicity, was found from only one subject in this study, while the frequency of m.2706A>G and m.3010A>G, which are associated with antibiotic-induced toxicity, were 99.15% and 30.51%, respectively. The 5 subjects were identied to have the variant m.2150T>TA, a genotype associated with highly disruptive effects on mitochondrial ribosomes. In addition, the mitochondrial haplogroups of 118 Korean subjects were found that D and M groups were the most dominant groups with the frequency of 34.74% and 16.1%, respectively. Conclusions: Our nding was constant with Korean 1K project and well reected the unique prole of mitochondrial haplogroup distribution. It was the rst study to annotate the whole mitochondrial genome with drug-induced toxicity to predict the ADRs event in clinical implementation for Korean population. further study specic mitochondrial in


Background
Mitochondria are organelles that play a central role in cellular energy suppliers. They have their own genome and genetic code, and an exceptionally high mutation rate [1]. At least 100 mutations are expected to be observed in almost 90% of non-proliferating cells, while no other cell types have fewer than 10 mutations by the age of 70 years [2]. Polymorphism of mitochondrial DNA (mtDNA) is associated with various pathophysiology, and could explain diverse vulnerability to the diseases or drug toxicity [3].
The high variability of human mitochondria has been studied in the context of common diseases. Ninetyve mitochondrial markers in the Mitomap database have been con rmed to be pathogenic as of 2020 [4]. Pathogenic mtDNA mutations were reported to be common in the general population, and are also present in some major haplogroups. Therefore, many healthy individuals carry potential harmful variants.
At least 1 in every 200 healthy subjects harbours a pathogenic variant that can be a potential cause of disease in the next generation [5].
In addition, mitochondrial polymorphisms were reported to be associated with drug responses, and can reveal the role of mtDNA variation in susceptibility to drug toxicity. It has been reported that variations in mtDNA can result in differences in mitochondrial function that, in turn, may lead to idiosyncratic druginduced toxicity [6]. Limited information, however, is available about the roles of mitochondrial haplogroups in susceptibility to drug-induced toxicity. Anti-retroviral therapy (ART), antibiotics and chemotherapeutic agents are well known drug classes in relation to the mitochondria mediated drug toxicities [7]. The mtDNA mutations and copy numbers have been proposed as potential biomarkers to monitor therapeutic responses and prognosis in cancer treatments, although the mechanisms are not well established [8,9]. There are reports that speci c mtDNA haplogroups are associated with ARTinduced peripheral neuropathy and metabolic disorders from clinical studies conducted in European patients [10][11][12][13][14][15]. The susceptibility of antibiotics toxicity is also well known to associated with mtDNA polymorphisms, especially in mitochondrial 12S rRNA and 16S rRNA genes which play an important role in preventing or inducing toxicity [16].
The mtDNA has a unique genetic feature known as 'heteroplasmy', which allows mutant and wild-type mtDNAs to coexist. It is well known that healthy individuals harbour relatively low levels (< 1%) of mtDNA heteroplasmy in general. It is known that the mtDNA heteroplasmy increases throughout the lifespan, and the initial mutant mtDNA will be the predominant type within a speci c cell. Once a certain threshold level of mutant mtDNA is above the normal range, the mutant load in uences on the cellular and patient phenotypes. Hence, a low level of heteroplasmy would be linked to late-onset of related diseases in an individual subject [17]. Since the Next-generation sequencing (NGS) technology is a tool to identify and quantify the level of heteroplasmy at the 1% range, this high throughput technology has been implemented extensively towards decoding the mitochondrial genome in many ethnic populations [18][19][20][21][22][23]. Two reports are available so far for the NGS-based mitochondrial genome study in a Korean subject [24,25], and they found the similar pro le of mitochondrial genetic variants with a Northern Chinese Han and Japanese populations. However, those reports are not issued on the potential implementation of those mtDNA variants as potential biomarkers for the prediction of drug induced toxicity.
In this study, therefore, the mtDNA genetic pro les of 118 Korean subjects were analysed by using the NGS approach, in order to investigate the distributions and frequencies of known mtDNA variant biomarkers associated with diseases and drug-induced toxicity in a Korean population.

Methods
Sample collection and DNA extraction DNA samples were harvested from peripheral blood that were selected from biobank in Pharmacogenomics Research Center, Inje University, Korea. In totally, 118 Korean subjects, all of whom provided written informed consent to participate in the genotyping analysis, were adults with a mean age of 37.1 years (range: 19-65 years). The study population consisted of 65 males (55%) and 88 (74.6%) non-smokers. Sixty-eight subjects were diagnosed with drug-susceptible tuberculosis and treated with rst-line anti-tuberculosis drugs (rifamycin, isoniazid, pyrazinamide and ethambutol). These subjects had no adverse drug reactions, no comorbidities and normal laboratory test results during treatment. Blood samples were collected at the time of recruitment and frozen at −80°C. Total DNA was isolated from peripheral blood samples using a QIAamp DNA Blood Mini Kit (Qiagen, German) according to the manufacturer's instructions. DNA purity and concentration were measured using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scienti c, USA).
Ampli cation of mitochondrial long-range fragments A well-established method for exclusive ampli cation of mitochondrial long-range fragments as described by Gould et al. [24] was used to avoid the co-ampli cation of nuclear genomic DNA. The PCR primers designed for the study included MT-L1_F

Analysis of mitochondrial genomes
A bioinformatics pipeline was developed to reconstruct and analyse human mtDNA from high-throughput sequencing data. The sequencing data obtained from the Illumina MiSeqDx were quality-controlled using FastQC (version 0.11.9) [28] and MultiQC (version 16) [29]. All adapters and sequences with a quality score < 30 and sequence length < 70 bp were removed using Trimmomatic (version 0.39) [30]. In the alignment process, to minimise the nuclear mitochondrial DNA segment (NUMT), the quali ed data of each sample were aligned simultaneously to human genome reference hg19 based on the revised Cambridge Reference Sequence (rCRS) of human mitochondrial DNA [31] using BWA (version 0.7.17-r1188) [32]. Regarding downstream alignment processes such as conversion from Sequence Alignment/Map (SAM) to Binary Alignment/Map (BAM) format and marking and removing duplicates were performed using GATK (version 4.1.8.1). The nal alignment assessment was performed using Qualimap (version v.2.2.1) [33]. Variants and heteroplasmy were called using GATK Mutect2 (version 4.1.8.1) in mitochondrial mode. All variants with a heteroplasmy level < 5% were ltered for further analysis. Other ltering processes were in accordance with the instructions of the GATK best practices for mitochondrial genomes [34]. The results of the variant-calling pipeline were con rmed based on the NIST mtDNA standard (GM09947A) [26] before applying to real samples. Finally, all variant-calling pipelines were combined. The variants were annotated using HmtNote (version 0.7.2) [35] against the HmtVar database [36]. The haplogroups of all samples in the cohort were also checked using HaploGrep2 [37].

Distribution of mitochondrial variants in the sequenced mtDNAs
The bioinformatics pipeline was con rmed to be consistent with the variant-calling pipeline for the mtDNA standard sequence [26]. According to NIST [38], the percentage of heteroplasmy for m.1393G>A was 17.4 ± 1.7% and that for m.7861T>C was 74.6 ± 14.5%, while the values in our study were 17.4 ± 0.6% and 89.5 ± 0.1%, respectively. These results con rmed the accuracy of the bioinformatics pipeline and experimental procedure for mtDNA sequencing and analysis. The average mtDNA coverage for the examined samples was 2,248X ± 599X.
We identi ed a total of 681 variants called from 118 individuals ( Figure 1). Most of the variable positions were transitions (86.5% overall), while the overall percentage of transversions was about 3%. Our study showed that transitions were dominant, as reported previously for human mitochondrial genomes [39]. Indel polymorphism was detected at a rate of 8.5% among all individuals. The median of the mtDNA variants was 38 (range: 26-55). No variants were found in 9/22 tRNA-coding genes. MT-TRNP and the displacement loop (D-loop) had the most polymorphisms (113 and 74 variants, respectively) (Supplementary Table 1). No disease mutations with 'con rmed' status in the current MITOMAP database [39] were found in any individuals.
The lowest heteroplasmy level of mtDNA variants was 10% for several non-coding variants. The levels of heteroplasmy of some mutations (m.3206C>T, m.4824A>G, m.8473T>C, m.16179CA>C, and m.16183A>C) varied from approximately 10-100% (homoplasmy) and their allele frequency in the studied population was 11%. The m.16189T>C polymorphism causing a lower mtDNA copy number was present in 29/118 (24.58%) samples as homoplasmy, and in one healthy individual at a heteroplasmy level of 54.6% (Supplementary Table 2).

Classi cation of mitochondrial haplogroups in the Korean population
We examined all haplogroups previously reported in the Korean population in Korea1K (n = 1,094) [40], and listed in the Korean National Standard Reference Variome Database of whole genomes (KoVariome) [41]. Eleven variants (m.73A>G, m.263A>G, m.750A>G, m.1438A>G, m.2706A>G, m.4769A>G, m.7028C>T, m.8860A>G, m.11719G>A, m.14766C>T, and m.15326A>G) with an overall frequency ≥ 50% that are widespread across all lineages (L, M, and N) [42] were predominant in our study. Among these, ve mtDNA variants present at ≥ 80% in lineages L, M, or N were found in all individuals. The rare mutation m.73A>C was observed in one individual in our population. The most dominant haplogroup for the Korean population was found to be D (34.74%), followed by M (16.1%). These haplogroups are prevalent in Asian populations [43] and were diverse in our study subgroups (Figure 2A, Table 1). The D2 subhaplogroup, de ned by mutation m.16271T>C [44] and possibly equating to D4e1b [45], was observed in one subject. Consequently, 87.8% of haplogroup D belonged to the D4 subgroup, which had the highest frequency in our Korean population; the D5 subgroup was observed in all remaining subjects. These results were consistent with the data of Korean1K, which includes sequence data from 1,094 whole genomes. The distributions of the haplogroups in the Korean population were obtained from Korea1K [40], KoVariome [41] and ve super-populations in the 1000 Genomes Project (1KG) [31] and then compared to our data. Our study showed the concordance of the haplogroups of the Korean and East Asian populations (1KG-EAS) ( Figure 2B). Other super-populations had distinct mtDNA haplogroups, as described elsewhere [46]. Haplogroups M, A, H and L were most prevalent in South Asian (1KG-SAS), American (1KG-AMR), European (1KG-EUR) and African (1KG-AFR) populations, respectively.

Correlations between annotated mtDNA variants and druginduced toxicity
The D-loop is 1,122 bp in length, and consists of two hypervariable regions (HVI at nucleotides 16024-16383 and HVII at nucleotides 57-372) and a tandem repeat of poly(C) at nucleotides 303-315 [47]. In all subjects, the mtDNA D-loop region had 74 polymorphisms. All 118 samples had the polymorphism m.263A>G in HVII and one subject carried m.73A>C, which was reported to be rare in Asian populations We observed a total of 40 mitochondrial variants in mitochondrially encoded 12S rRNA (MT-RNR1) and mitochondrially encoded 16S rRNA (MT-RNR2) genes. Several mtDNA mutations associated with antibiotic-induced toxicity were detected in these two genes ( Table 2). The m.1555A>G and m.1494C>T mutations are well known to be associated with ototoxicity in patients treated with aminoglycoside antibiotics, but were not observed in our population. We may not have been able to detect these rare variants, reported to be found at a rate of 2 per 2,922 individuals (0.07%) in the Korean population [48], due to the small sample size. Other aminoglycoside-induced hearing loss variant, m.1189T>C, was observed in 1.69% of all samples; the rate was previously reported as 0.34% for the Korean population [49,50]. Variant at position 961 was identi ed at a frequency of 5.93%; these are more likely to be found in Korean than Asian populations (1-3% in the 1KG). The polymorphism of m.2706A>G transition, a variation previously suggested to lead to a predisposition to linezolid-associated lactic acidosis [51,52], was dominant in all subjects. The m.3010G>A allele, which is associated with linezolid-induced mitochondrial toxicity, was more common in the Korean population (30.51%), consistent with previous studies [44,]. Lastly, we observed a high frequency of m.10398A>G variant that was reported to in uence metabolic ART effects [53] at homoplasmic level in our study.

Clinical implications of D-loop variants
The D-loop has a regulatory role in mtDNA transcription and replication [54]. Mutations in the D-loop region may signi cantly affect mtDNA copy number and the gene expression of mitochondrial genomes, thereby disrupting the function of mitochondria, oxidative phosphorylation and ATP production. Here, we found the largest number of polymorphisms in the D-loop region in all subjects. Notably, 25.42% of the samples harboured the mutation m.16189T>C, which is very close to the mtDNA origin of replication and is signi cantly associated with various non-communicable diseases [55][56][57][58]. This could be explained by the lower binding a nity of proteins to the region with the 16189C variant [55]. A negative correlation between continuous cytosines at the mtDNA segment 16180-16195 in HVI and mtDNA copy number was reported previously in human peripheral blood cells [59]. The reduction of mitochondrial copy number might lead to the mitochondrial dysfunction that is potentially a mechanism of drug-induced toxicity [60]. It suggested that m.16180T>C variant might increase the risk of developing adverse drug reaction. Thus, the impact of this variant should be considered in further investigation.
The non-synonymous m.10398A>G polymorphism (overall frequency = 70.34%) was reported to be associated with long-term ART-induced toxicity. Notably, most of the variants mentioned above showed homoplasmy (> 98.4% of all mtDNA sequences analysed [61]). The phenotypic expression of pathogenic and heteroplasmic mtDNA mutations could be modulated by homoplasmic mtDNA variants [53]. This suggests that determination of these variants prior to treatment could be useful to identify tuberculosis and HIV patients at risk of drug-induced toxicity.
The frequency of length heteroplasmy in the homopolymeric C-stretch regions was as low as ~4% in our population, and we detected only one individual with possible length heteroplasmy at position 303-315. Compared to point heteroplasmy, length heteroplasmy is more common and less population-speci c Mitochondrial rRNA more closely resembles bacterial than human rRNA. However, genetic polymorphisms in these genes can result in greater similarity between mitochondrial and bacterial rRNA, thereby facilitating the binding of anti-microbials (most notably aminoglycosides). In cases of tuberculosis infection treated via multi-antibiotic administration, the associations of mtDNA instability and variation with drug-induced toxicity should be taken into consideration. The polymorphisms in the mitochondrial RNR1 and RNR2 genes were associated with increased mitochondrial and clinical adverse effects, most commonly ototoxicity and peripheral neuropathy [67,68]. The mutation of mitochondrial 12S rRNA gene is associated with both aminoglycoside-induced deafness and non-syndromic hearing loss. This study detected higher allele frequency in some variants (m.663A>G, m.961T>C, m.3010G>A) compared to previous studies [44,53]. This suggested that these variants should be considered carefully in studies of the Korean population. It has recently been reported that mtDNA polymorphisms may affect respiratory chain function and rst-line anti-tuberculosis drug-induced liver injury. However, the underlying mechanisms have yet to be elucidated [69,70].
The rare mutations observed in our clinical samples may have highly disruptive effects on mitochondrial ribosomes; based on clinical data, potential disruption is classi ed as 'proven' or 'not enough evidence' (NEE) [71,72]. Among 52 and 145 mutations in the MT-RNR1 and MT-RNR2 genes, respectively, only three (m.1494C>T, m.1555A>G and m.1843T>C) were 'proven' in clinical practice. Most of the remaining variants were classi ed as NEE, so it is not yet clear whether changes in mitochondrial ribosomes affect the binding sites of antibiotic drugs to reduce drug toxicity. Other mtDNA-encoded ribosome components were suggested to have potential bene ts in terms of drug susceptibility and should therefore be investigated further in clinical studies. In the present study, only m.2150T>TA, classi ed as NEE, was observed (at a rate of 4.24%; 5/118 samples) as a homoplasmic mutant.
In addition, adverse drug reactions were reported in patients inheriting pathogenic mtDNA mutations. Antibiotic drugs were reported to be associated with optic neuropathy in Leber's hereditary optic neuropathy (LHON) carriers. In patients carrying the m.11778G>A (MT-ND4) mutation, erythromycin has the potential to catalyse a bioenergetic crisis at onset of LHON [73], while ethambutol was suggested to have a synergistic and deleterious effect on tissue speci city, as reviewed elsewhere [74]. Therefore, the use of erythromycin and, by extension, other macrolides, should be avoided in patients with pre-existing pathogenic mtDNA mutations.
Haplotype and risk assessment of diseases and druginduced toxicity mtDNA polymorphisms have been shown to be associated with various pathophysiological conditions, and could play a role in various physiological and pathological characteristics [3]. The high variability in human mitochondria has been investigated for common diseases [2]. Ninety-ve mitochondrial markers in the Mitomap database have been con rmed to be pathogenic, as of 2020 [4]. Pathogenic mtDNA mutations were commonly found in a number of major haplogroups in the general population, indicating that many healthy individuals carry these potentially harmful mtDNA mutations. It was reported that, among 200 healthy individuals, at least one would harbour a pathogenic mtDNA mutation with the potential to cause disease in the next generation [5]. Therefore, mtDNA mutations could serve as biomarkers facilitating early detection and prediction of the prognosis of disease. However, our study identi ed no pathogenic variants, possibly due to its small sample size.
In European populations, it has been reported that speci c mtDNA haplogroups were associated with ART-induced peripheral neuropathy and metabolic disorders [10][11][12][13][14][15]. In addition, many studies have demonstrated relationships between haplogroups and cancer risk. However, at the time of our study, there had been no investigations of the relationships between haplogroups and drug-induced toxicity for Korean population. Haplogroups vary widely across ethnic groups, even between South and East Asians. Therefore, the characteristics of haplogroups in speci c populations should be considered when exploring the susceptibility of these populations to diseases and drug-induced toxicity.

Conclusions
Drug-induced toxicity via mitochondrial dysfunction has been studied extensively. However, the contributions of mitochondrial variants to drug-induced toxicity remain unclear. The present study identi ed distinct haplotypes in a Korean population, consistent with the results reported previously by Korea1K. Despite the relatively small sample size of our study, we investigated correlations between mtDNA characteristics and risk factors for certain diseases and drug-induced toxicity, through NGS analysis. Further studies including larger cohorts are needed to examine the associations of mtDNA with adverse drug reactions in the Korean population. For this issue, we are going to extend the investigation in association with antituberculosis drug-induced toxicity such as drug-induced liver injury and peripheral neuropathy.

Declarations
Ethics approval and consent to participate The study was conducted following the Declaration of Helsinki and institutional criteria. The institutional review boards (IRBs) of all participating sites reviewed and approved the research. Written informed consents were obtained from all participants.

Consent for publication
Not applicable Availability of data and materials The datasets generated and/or analysed during the current study are available in the PRJNA774877 repository, https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA774877

Competing interests
The authors state that they have no con ict of interest.

Funding
Not applicable

Authors' contributions
Pham Vinh Hoa involved on study design and experimental conduction, data analysis and manuscript preparation; Nguyen Van Lam involved on data analysis and manuscript preparation; Hye-Eun Jung involved on study design, data analysis, and manuscript editing; Yong-Soon Cho involved on data analysis and manuscript revision; Jae-Gook Shin involved on project and study design, and data analysis, and manuscript edition and funding of grant.