Coverage Rate of ADME Genes: Application to CYP2C8, CYP2D6 and CYP3A Genes for Personalized Chloroquine Treatment Against Coronavirus

Background: The latest studies have shown the effectiveness of Chloroquine against Coronavirus. However, since the tolerance and effectiveness of statistical data must be taken into account before proposing treatment to a patient, these promising results are often lacking. Since the CYP2C8, CYP2D6 and CYP3A Absorption, Distribution, Metabolism and Elimination (ADME) genes are involved in the drug response of Chloroquine, we are interested in studying the variations of these genes. Methods: The purpose of this study is to make a comparison between the various current genotyping and enrichment platforms, to know which of them allows the best coverage. Conclusions: This will allow us to carry out genome-wide association studies (GWAS) with the aim of nding new therapeutic targets against Coronavirus using Chloroquine.

The interaction of these many genes and pathways are very complex, and current commercial platforms don't allow good coverage of the ADME variants [10,11].
This analysis aims to calculate the coverage rate of CYP2C8, CYP2D6 and CYP3A ADME genes for each of the variants and amplicon lists from our collection. To this end, we used the chromosome and the position of each variant from the ADME lists according to the build Hg19, and by using scripts written in python, we calculated the coverage of our interest lists according to the formula: We have also considered coverage of the markers of interest that can be achieved by markers that are in linkage disequilibrium (LD). To re ne the work and make it more signi cant, we compared these theoretical results expected according to the targeted coverage of the different technologies with the practical results obtained. The result of this analysis allows us to know which interest lists cover the best ADME genes.

Data
We have two sets of lists, ADME lists and interest lists (Table 1) which we attend to calculate coverage rates. Axiom (~ 85 500 variants) HaloPlex (~ 21 000 genes) ADME variants are extracted from genes that were determined to be associated with drug metabolism [12]. The 34 variants used in this work are extracted from CYP2C8, CYP2D6 and CYP3A genes, the main isoforms affected or involved in the metabolism of Chloroquine [13].
We focus in this study on genotyping lists (Omni and Axiom) and enrichment lists (SureSelect and HaloPlex). These platforms are likely to cover ADME genes.

Axiom coverage
Axiom® is a solution designed by Affymetrix for the genotyping of large sample collections such as those screened at biobanks, genome centers, and core labs. The arrays incorporate multiple content categories, including a genomewide association study (GWAS) panel of markers for genomewide coverage in major ethnic groups, rare coding SNPs and indels for exome analysis, pharmacogenomic markers, eQTLs, and newly discovered loss-of-function variants, including sequence insertions and deletions from recent exome sequencing initiatives [15]. We considered using Axiom in our comparison since it includes pharmacogenomic markers.

Sure Select coverage
The list SureSelect is a list of covered polymorphisms from the "Agilent Technologies" product for capture hybridization, which has the following website: http://www.genomics.agilent.com/. This list includes 554,751 amplicons likely to cover the variants of each one of the ADME lists.

HaloPlex coverage
HaloPlex technology provides outstanding performance, streamlined work ow, and low sample input requirements for next generation sequencing of human exomes. The HaloPlex Exome has been optimized to provide comprehensive coverage of the coding regions of the human genome [16].

Choice of programming language
Find a programming language to achieve a given project is not an easy task. Almost daily, new languages are created and old ones are updated. Improving the programming languages allows making programs more reliable, faster to develop and easier to maintain.
Among languages references, there is the C language, which dates from the 70 s and is still current. It combines the features of advanced languages associated to features related to assembly languages.
Compared with C, Python is a language relatively slower in terms of execution time. However, if we take into consideration the time required for programming, and biological information processing that is in text form, Python is much better than the C language as it is speci cally used for its powerful text processing. Thus, since our project is based largely on the manipulation of les and database (extraction and processing of information), we opted for the use of Python that facilitates this task compared to the C language.
Linkage disequilibrium Linkage disequilibrium (LD), the non-random association of alleles from different loci, is often the basis for evaluating the association of genomic variation with human traits among unrelated subjects. If such an association is found between a particular marker locus and the phenotype, it suggests that either the variation at that marker locus affects the phenotype of interest, or that the variation of that marker locus is in LD with the true phenotype-related locus, which was not genotyped [17].
In order to see whether better coverage of the ADME variants of interest could be achieved with the genotyping platforms when taking into consideration the linkage disequilibrium between variants of interest and SNPs on the chips, we have proceeded with LD calculations.
In our case, we are interested only in the variants that can be caught, therefore the variants that are in LD and whose correlation is greater or equal to 0,8 (r² >= 0.8), and that, only for the Omni 5.0 and Axiom that allow the best coverage rate for variants of core ADME.
To compute the LD, we used the program PLINK, with the command that allows the extraction of variants of 1000 genomes database that are in LD with variants of core ADME list: Where "msysnps.txt" is a list of IDs of SNPs.

Results And Discussion
To calculate the coverage, we relied on the variants contained in each of the genotyping chips.
The physical coverage rates of the CYP2C8, CYP2D6 and CYP3A ADME genes by genotyping platforms detailed beforehand are recorded ( Table 2).  We took into consideration the coverage that can be achieved by LD with the variants of the ADME lists (Table 3). As shown in the table, even by taking into consideration the SNPs of Omni platforms which are in LD with the variants of ADME list, the coverage rates increase only slightly, and therefore stays sub-optimal.
Concerning the Axiome platform, the coverage taking into account the LD increases signi cantly up to 88.24%, that is to say a coverage of 30 among the 34 variants of our genes of interest. Even by taking into consideration the markers that are in LD, this list's coverage rate has not reached 100% (Fig. 1).
Moreover, even by combining the ve platforms, and taking LD into consideration, we will never be able to cover the two variants rs72549353 and rs72549357 (Table 4). Table 4  Summary table of genotyping platforms coverage For enrichment platforms, we relied on probes contained in these lists.
The coverage rates of the ADME genes by HaloPlex and SureSelect enrichment platforms are recorded in the following summary tables (Tables 5-6).  As we can infer, from enrichment lists previously described, the SureSelect platform allows the best coverage of the ADME variants, up to 33 among the 34 variants of our genes of interest (97.05%), wich is su ciently to conduct pharmacogenomics studies with this tool.
Due to technological limitations, complex genomic regions, including certain ADME genes, are generally excluded from high-throughput genotyping and sequencing chips [18].
Although the quality of clustering is generally good for the vast majority of genetic variants present on such commercial platforms, it is not uncommon for clustering to function poorly in highly homologous genomic regions such as those of several ADME genes, including, but not limited to, CYP2C8, CYP2D6, CYP3A4 or CYP3A5 [8].
Previous studies have shown the limitations of genome-wide methods for pharmacogenomic testing.
Gamazon et al. [19] focused on one set of genes most important in pharmacogenomics and personalized medicine, using only genotyping platforms. Their results demonstrated that even taking into account the SNPs that are in LD, the rate of coverage of these genes by genotyping platforms is sub-optimal.
In another study [7], they also evaluated the sequencing platforms. The HaloPlex enrichment platform enabled the best coverage of ADME variants. But this coverage remains for all of the ADME genes.
There are also chips marketed with targeted pharmacogenetic content, such as the DMET panel (Enzymes and transporters metabolizing metabolics of Affymetrix) or the iPLEX PGx Pro panel from Agena, which provide targeted coverage of the most di cult pharmacogenetic variants. However, these panels have not been examined here since they do not offer genomic coverage, but they could possibly be used as complementary tests in addition to a genomic set in the context of pharmacogenomic studies [20].

Conclusion
Recent research has shown that Chloroquine could be a promising drug against coronavirus.
The work we undertook aimed to nd the best platform that can cover the CYP2C8, CYP2D6 and CYP3A genes, the main isoforms affected or involved in the metabolism of Chloroquine.
To do this, we developed scripts in Python to calculate the coverage rate for each of these chips. Using the PLINK tool, we estimated the coverage rate obtained by the different chips by taking into consideration the SNPs markers which do not cover the ADME lists but which are in linkage disequilibrium (LD) with the variants of these lists.
The genotyping Axiom and enrichment SureSelect provided both genome-wide and pharmacogene coverage, which is crucial in the discovering of new variants responsible for drug adverse effects. This combination, which showed the best coverage of the core list, will help in the design of pharmacogenomic studies and will enable to nd probably new therapeutic targets in the steps to ght against Coronavirus using Chloroquine treatment.

Declarations
Availability of data and materials The datasets used and/or analyzed during the current study are available from the rst author or corresponding author on reasonable request.
Author contributions NZ, LK, OB, YL and YZ contributed to conceptualization, data curation and formal analysis of the manuscript. NZ, IH, JT, KS and AC contributed to investigation, methodology and software. All authors contributed to validation, visualization participated in writing original draft and editing of the manuscript.

Data availability statement
All data generated or analyzed during this study are included in this published article.

Declaration of Competing Interest
Authors declare no con ict of interest.
Ethics approval and consent to participate N/A.

Consent for publication
The contents and publication of the manuscript have been approved by all coauthors. Figure 1 Comparison between physical coverage and coverage with LD

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. CovLetNZ.doc