Selection and analysis of CYP2C9 Variations
Eighteen CYP2C9 variants across 101 Jordanian individuals of Arab descent associated with reduced enzyme function were selected (Table 2 & Additional file 4). The defective allele *2 (rs1799853) was the most abundant variant (0.094), followed by allele *3 (rs1057910) (0.084). In addition, two rare variants, c.1425A>T (rs1057911) and 50196C>T (rs2017319), were also detected. These two SNPs had frequencies of less than 0.005. CYP2C9 *2 and *3 together accounted for 17.8% of the allele frequency and about 32.7% of the reduced or non-functional genotype/phenotype associations. The four genotype frequencies CYP2C9 *1/*1, *1/*2, *1/*3 and *2/*3 were 0.673, 0.158, 0.139 and 0.03, respectively. Moreover, the genotype frequencies showed no deviation from HWE (p > 0.05; Table 3).
Table 2. Details of the 18 allele frequencies of CYP2C9 gene.
SNP ID
|
Physical Position
|
Common Name
|
CYP2C9 Allele
|
Marker Call Rate
|
Minor Allele Frequency
|
rs1799853
|
96702047
|
CYP2C9*2_3608C>T(R144C)
|
*2
|
100
|
0.094
|
rs1057910
|
96741053
|
CYP2C9*3_42614A>C(I359L)
|
*3
|
99
|
0.084
|
rs56165452
|
96741054
|
CYP2C9*4_42615T>C(I359T)
|
*4
|
99
|
0
|
rs28371686
|
96741058
|
CYP2C9*5_42619C>G(D360E)
|
*5
|
100
|
0
|
rs9332131
|
96709039
|
CYP2C9*6_10601delA(K273X)
|
*6
|
100
|
0
|
rs67807361
|
96698494
|
CYP2C9*7_5080C>A(L19I)
|
*7
|
99
|
0.198
|
rs2256871
|
96708974
|
CYP2C9*9_10535A>G(H251R)
|
*9
|
100
|
0
|
rs9332130
|
96709037
|
CYP2C9*10_10598A>G(E272G)
|
*10
|
100
|
0
|
rs28371685
|
96740981
|
CYP2C9*11_42542C>T(R335W)
|
*11
|
100
|
0
|
rs9332239
|
96748777
|
CYP2C9*12_50338C>T(P489S)
|
*12
|
100
|
0
|
rs72558187
|
96701715
|
CYP2C9*13_3276T>C(L90P)
|
*13
|
100
|
0
|
rs72558189
|
96701991
|
CYP2C9*14_3552G>A(R125H)
|
*14
|
100
|
0
|
rs72558190
|
96707539
|
CYP2C9*15_9100C>A(S162x)
|
*15
|
100
|
0
|
rs72558192
|
96731936
|
CYP2C9*16_33497A>G(T299A)
|
*16
|
100
|
0
|
rs869277704 (rs72558188)
|
96701970
|
CYP2C9*25_3531_3540del10
|
*25
|
100
|
0
|
rs2017319
|
96748635
|
CYP2C9_55221C>T(A441A)
|
NA
|
99
|
0.045
|
rs1057909
|
96741051
|
CYP2C9_42612A>G(Y358C)
|
NA
|
100
|
0
|
rs1057911
|
96748737
|
CYP2C9_55323A>T(G475G)
|
NA
|
100
|
0.045
|
Table 3. Distribution of CYP2C9 alleles and genotypes in Jordanian Arabs.
SNP ID
|
Allele
|
Allele Count
|
Allelic Frequency
|
Genotype
|
Genotypic Frequency
|
Observed Genotypes
|
Expected Genotypes
|
χ2
|
p-value
|
rs1799853
|
C
|
183
|
0.91
|
CC
|
0.81
|
82
|
82.89
|
1.09
|
0.58
|
T
|
19
|
0.09
|
CT
|
0.19
|
19
|
17.21
|
|
|
|
|
|
TT
|
0.00
|
0
|
0.89
|
|
|
|
|
|
|
|
|
|
|
|
|
rs1057910
|
A
|
185
|
0.92
|
AA
|
0.83
|
84
|
84.72
|
0.85
|
0.65
|
C
|
17
|
0.08
|
AC
|
0.17
|
17
|
15.57
|
|
|
|
|
|
CC
|
0.00
|
0
|
0.00
|
|
|
Significant D' values were observed spanning the entire genomic region, following LD measurements for pairs of SNPs distributed across the 52-kb region. Most allele pairs of CYP2C9 have a D′ value equal to 1.0 (indicates complete LD), whereas, r2 values across the same region, show a LD block between the *7 allele (rs67807361) at exon 1 and the *14 allele (rs72558189) at exon 3. A clear LD block was also observed between CYP2C9*3 (rs1057910) at exon 3 and between c.1425A>T (rs1057911) at exon 9, crossing an approximately 8-kb region (Figure 2A and Additional file 5). The experimental group rs67807361 was significantly different from the other sampled populations (p=4.9x10-22; Table 5). However, the nucleotide BLAT search showed that the DNA sequence obtained from the flanking region of this SNP (124bp) had 100% sequence identity with the CYP2C19 and AL583836.1 genes at the region of 10:94762716-94762839. Therefore, this variant was excluded from the analyses since the individual probes in the MIP assay only bind to a genomic footprint of ~40bp. Thus, the homologous sequences would likely result in false-positive or false-negative variant calls [17].
Genetic structure of CYP2C9 across populations
The two leading principal components from the 13 variants shared between the Jordanian Arab population and the 22 global populations from the 1000 Genomes Project Phase III (1kG-p3) dataset (Figure 3A), captured 60.63% and 21.38% of the variance respectively, showing a well-defined separation between Jordanian Arabs and AFR, EAS, and SAS super populations. Jordanian Arabs had a close affinity with EUR (comprised of GBR, FIN, TSI, CEPH, and IBS), and validated by pairwise Fst analyses (Additional file 6). The lowest level of differentiation was observed between the Jordanian Arab population and GBR (Fst=5.97 x 10-3), followed by IBS (Fst=6.39 x 10-3), and FIN (Fst=6.69 x 10-3), whereas the greatest divergence was observed with GWD (Fst=8.54 x 10-2).
Lack of genomic data for additional ethnic groups in the 1000 Genomes Project such Middle Eastern populations (ESN), can reduce robustness and potentially result in biased geographic-based genomic analysis. Therefore, a secondary analysis was performed to include under-represented Arab populations. The two leading principal components shared between the Jordanian Arab population, and the 18 global reports including ESN for *1, *2 (rs1799853) and *3 (rs1057910); (Table 4) captured 98.35% and 1.62% of the variance, respectively, suggesting a well-defined genetic separation between Jordanian Arabs and AFR and EAS populations (Figure 3B). In addition, defined clusters of EUR and ESN populations were found, which were further validated using pairwise Fst analyses (Additional file 7). The lowest level of differentiation was observed between the Jordanian Arab population and Saudi Arabian population (Fst=7.4 x10-4), followed by Italian (Fst=1.62 x 10-3) and Turkish populations (Fst=1.8 x 10-3), whereas the greatest divergence was observed with the Korean population (Fst=1.13 x 10-1).
Table 4. Reference populations of 19 public worldwide populations for the MDS analysis (with the Jordanian Arab population).
|
|
CYP2C9 alleles (%)
|
|
|
|
Population
|
*1
|
*2
|
*3
|
N
|
Reference
|
Near Eastern
|
Jordanian*
|
82.2
|
9.4
|
8.4
|
101
|
Current Study
|
Lebanese
|
79.2
|
11.2
|
9.6
|
161
|
[28]
|
Saudi Arabian1
|
80.8
|
11.0
|
8.0
|
112
|
[29]
|
Egyptian
|
82.0
|
12.0
|
6.0
|
247
|
[30]
|
Tunisia
|
78.0
|
13.9
|
8.1
|
258
|
[31]
|
Turkish
|
79.4
|
10.6
|
10.0
|
499
|
[32]
|
Iranian
|
79.3
|
11.0
|
9.7
|
160
|
[33]
|
European
|
British
|
79.0
|
12.5
|
8.5
|
100
|
[34]
|
Italian
|
79.6
|
11.2
|
9.2
|
157
|
[35]
|
Spanish
|
74.5
|
15.6
|
9.8
|
102
|
[36]
|
African
|
Afro-American1
|
98.5
|
1.0
|
0.5
|
100
|
[37]
|
Afro-American2
|
94.6
|
2.5
|
1.3
|
120
|
[38]
|
Ethiopian
|
93.3
|
4.3
|
2.3
|
150
|
[35]
|
East Asian
|
Chinese1
|
98.3
|
0.0
|
1.7
|
115
|
[39]
|
Chinese2
|
95.1
|
0.0
|
4.9
|
102
|
[40]
|
Japanese1
|
98.2
|
0.0
|
1.8
|
140
|
[41]
|
Japanese2
|
97.9
|
0.0
|
2.1
|
218
|
[42]
|
Korean
|
98.9
|
0.0
|
1.1
|
574
|
[43]
|
Taiwanese
|
97.4
|
0.0
|
2.6
|
98
|
[37]
|
These findings were investigated further by increasing the coverage of the variant analysis to include populations from 118 reports across EUR and ESN (Additional File 3). MDS analysis showed that the two leading principal components of CYP2C9 alleles *2 and *3, shared between the Jordanian Arab population and the 118 reports from EUR and ESN captured 64.45% and 35.55% of the variance, respectively. Jordanian Arabs cluster with Turkish, Israeli, Caucasian, Italian, Romanian, Iranian and Lebanese populations (Figure 3C).
Pharmacogenetic analyses by biogeographic grouping system
Across the nine biogeographical groups, 27% of subjects were of East Asian origin, followed by Europeans (26%), South Central Asians (13%), Near Easterns (12%), Americans (7%), Latinos (6%), African Americans/Afro-Caribbeans (4%), Sub-Saharan Africans (4%), and Oceanians (1%; Table 1). Distinct differences were found among these populations, with direct impact on ibuprofen clinical outcomes (Figure 4B). The CYP2C9 *2 (rs1799853) and *3 (rs1057910) allele frequencies were significantly higher in the Central/South Asian origin (0.224), followed by Near Easterns (0.212), Europeans (0.203), Jordanian Arabs (0.178), and Latinos (0.116), indicating a decreased metabolism and clearance of ibuprofen as compared to Americans (0.064), Oceanians (0.045), East Asians (0.04), African Americans/Afro-Caribbeans (0.036) and Sub-Saharan Africans (0.024, Table 7). These significant variant alleles and genotypes were classified as PharmGKB Level 1A evidence with reduced enzyme function, and therefore are associated with recommended changes to ibuprofen dosing [7]. Interpretation of the translation into specific dosing guidelines for individual ibuprofen-diplotype pairs [7, 11] showed that Central/South Asian, Near Eastern, and European populations are 7.9x to 5.9x more likely to show impaired CYP2C9 metabolism than African populations (Sub-Saharan and African American/Afro-Caribbean populations, respectively), and 4.9x more likely than East Asian populations (Table 8 and Additional File 8).
Interestingly, a large number of generally less common alleles were also identified based on this approach (Additional File 9). Allele *9 (rs2256871) was significantly over-represented in Sub-Saharan Africans (0.13), but was not detected in other global populations (Figure 4A). Alleles *5 (rs28371686), *6 (rs9332131), *8 (rs7900194) and *11 (rs28371685) were significantly over-represented in African populations (Sub-Saharan African and African Americans/Afro-Caribbean) and under-represented in other populations. East Asian populations over-represented alleles *42 (rs12414460) and *55 (42620C>A).