DOI: https://doi.org/10.21203/rs.3.rs-572591/v1
Background: The major histocompatibility complex in humans includes three classical class I loci (A, B and C), which are important biomarkers for transplant of organs and hematopoietic stem cells. In the MHC, polymorphism is known to be extremely high while interlocus recombination is rare. We report a rare interlocus recombination between HLA-A and HLA-H, which was analyzed using next generation sequencing and nanopore sequencing.
Results: In the sample, the genotypes of HLA-A, B, C, DRB1 and DQB1 were firstly phased with methods of sequence-specific primer, sequence-specific oligonucleotide, Sanger’s sequencing and NGS; however, HLA-A could not be phased. Nanopore sequencing was finally utilized to distinguish the sequence of the novel allele. Finally, the novel HLA-A*11:335 allele was identified as an interlocus recombination involving HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles; this was mainly achieved by nanopore sequencing.
Conclusions: The identification of the interlocus recombination indicated that nanopore sequencing may be the most precise method for HLA typing. Interlocus recombination has been identified as one of the mechanisms involved in the generation of novel HLA alleles.
The human classical class I loci, which include HLA-A, -B, and -C, encode MHC molecules, which are expressed on all nucleated cells. Their main function is to present intracellularly processed foreign peptides to cytotoxic T cells[1]. Classical and nonclassical class I genes have a similar structural organization. Exon 1 encodes the leader peptide, and exons 2, 3 and 4 encode the three extracellular domains, α1 α2 and α3, respectively. The 57 residues of the antigen recognition site are located in the α1 and α2 domains [2, 3]. Exon 5 encodes the transmembrane portion of the molecule, and exons 6, 7 and 8 encode the cytoplasmic tail. Several mechanisms have played a role in the generation of HLA polymorphism, including crossing over, gene conversion and point mutations [4]. Point mutations can occur as single nucleotide substitutions, which may produce synonymous or non-synonymous changes. The rate of non-synonymous changes greatly exceeds the rate of synonymous changes within the antigen binding domain, suggesting a selection-driven mechanism [5]. Besides point mutation, recombination between homologous chromosomes has been identified as a mechanism involved in the generation of a number of HLA alleles. Recombination can involve genes of the same locus or different loci, producing intralocus or interlocus genomic exchanges [6]. We herein report a novel HLA-A*11 allele, A*11:335, which was identified as an interlocus recombination involving the HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles in a Chinese bone marrow donor and analyze the consequence of this recombination. This interlocus recombination was mainly identified by nanopore sequencing.
A total of 2964 specimens were sampled (approximately 2%) from the database of recruited volunteers of the China Marrow Donor Program in 2017 and secondly typed for HLA-A, B, C, DRB1 and DQB1. The DNA sample 17ZZ2298 was originally extracted from a volunteer’s peripheral blood by the BGI laboratory, which was a cooperating partner of the China Marrow Donor Program. HLA typing for A, B, C, DRB1 and DQB1 was firstly performed with the BGI Next Generation Sequencing Typing method-RCHSBT [7] (BGI, Shenzhen, China).
HLA typing of the sample 17ZZ2298 was performed for a second time with Sanger’s sequencing method (Shenzhen Tissue Bank Precision Medicine Co., Ltd., China) to examine the results acquired from RCHSBT. Because the result of HLA-A typing was A*01:01/11:126, which was different from that obtained by RCHSBT (A*01:01/11:01), the sample was typed for a third time using the SSO method (Luminex 3D, Onelambda, USA). The result obtained with the SSO method was A*01:01/11:01, which was the same as that obtained with the RCHSBT method. The sample was further typed using Miseq based sequencing (Onelambda) and the assignment was A*01:01/11:126, with the following system comments: “Warning: mismatch in an intron, two or more variants cannot be phased, Locus has a high background position in exon.” The sample was further typed by next generation sequencing (NGS) using commercially available reagents (GenDX, Utrecht, The Netherlands) and a MiniSeq system (Illumina, San Diego, California, USA), and MinION based nanopore sequencing (ONT, Oxford, UK). Data were analyzed using the NGSengine software program (GenDX).
The sequence of the novel allele (mismatched area) was blasted in the IMGT/HLA database using the “BlastN” tool.
The effects of the 6 missense mutations in exon 5 on the function of the transmembrane domain was analyzed and predicted with the PSIPRED online tool (http://bioinf.cs.ucl.ac.uk/psipred/). The amino acid sequences of exon 5 of HLA-A*11:335 and HLA-A*11:01:01:01 were entered into the online tool and analyzed.
Genotype analysis
The sample 17ZZ2298 was firstly subjected to high-resolution typing of HLA-A, B, C, DRB1 and DQB1 by BGI Next Generation Sequencing Typing method-RCHSBT [7]. Exon 1-7 of HLA-A, B and C, exon 1-3 of HLA-DRB1, and exon 2-3 of HLA-DQB1 were sequenced. The high-resolution HLA assignment of the sample was as follows: A*01:01:01, 11:01:01, B*15:32:01, 57:01:01, C*06:02:01, 12:03:01, DRB1*07:01:01, 12:01:01, DQB1*03:01:01, 03:03:02; however, when the sample was further analyzed by Sanger’s sequencing method using three different reagents (CSTB, Biocapital and GenDx) the assignment for HLA-A was A*01:01/11:126. The sample was then reanalyzed by the SSO method and Miseq sequencing-based typing (Onelambda). The assignment determined by these methods was HLA-A*01:01/11:01 and HLA-A*01:01/11:126, respectively. However, the Miseq assignment had the following system comments: “Warning: mismatch in an intron, Two or more variants cannot be phased.” Indicating the possibility that HLA-A*01:01/11:126 was not the correct assignment. The only difference between HLA-A*11:01:01:01 and HLA-A*11:126 was at c.874A>G in exon 4 (Fig. 1A).
The sample was then further analyzed by Miseq-based typing (GenDx). The data showed that there was a new allele, but exon 3 and exon 4 could not be phased with the MiSeq data. Therefore, the Miseq reads were further analyzed together with a low number of MiniON reads. The recommended genotype was HLA-A*01:01:01:01/A*11:126 (Fig. 1B); however, there were numerous mismatches between exon 4 and exon 6. All mismatches (indicated by blue or red triangles) were in the HLA-A*11 allele, and were located between the last heterozygous position in exon 4 (gDNA 1824) and intron 5 (gDNA 2437), which was heterozygous in this sample. The bases found at gDNA 1824 matched with the two reported HLA-A alleles. The first mismatched position in intron 4 (gDNA 1887) was heterozygous AC in this sample. All known HLA-A alleles have an A at this position. Thanks to the phasing information, we found that the C belongs to HLA-A*11new. The last two heterozygous positions (gDNA 2431 and gDNA 2437) have A-A in one allele and G-T in the other allele. A-A occurs in many HLA-A alleles while G-T is not present in any HLA-A alleles. When region 1887 – 2437 (matching with HLA-A*11:126) or region 1824 – 2437 (matching with HLA-A*11:01:01:01) were excluded, the data were an exact match with HLA-A. The typing results of HLA-A with each reagent and the final nomenclature are listed in Table 1.
Sequence blast and mutation analysis
The sequence of region 1824 – 2437 (612bp, because of an “AT” deletion in intron 5) was then blasted in the IMGT/HLA database. The results showed that 612/612 (100%) was an exact match with HLA-H*02:07/14/18, as shown in Table 2 (https://www.ebi.ac.uk/Tools/services/web_ncbiblast/toolresult.ebi?
jobId=ncbiblast-E20201104-031132-0326-63995603-p1m). It was suggested that the sample contained a new HLA-A*11 allele (HLA-A*11:335), which was the result of the interlocus genomic exchange of HLA-A*11:01:01:01 and HLA-H*02:07/14/18. The distance between HLA-H and HLA-A on chromosome 6 was approximately 50kb. Austin L. Hughes also indicated that interlocus recombination had been a recurrent feature in the evolutionary history of the HLA class I region and suggested that class I pseudogenes had arisen through the duplication of class I genes over a long period of time[8]. The alignment of the genomic sequence of HLA-A*11:01:01:01 with A*11:126, A*11:335, H*02:07, H*02:14 and H*02:18 is shown in Fig. 2.
Transmembrane property analysis
As shown in Table 3, the sequence of HLA-A*11:335 differs from HLA-A*11:01:01:01 by 10 nucleotide substitutions, which resulted in 3 synonymous mutations and 6 missense mutations, mainly in exon 5. Exon 5 encodes the transmembrane domain of HLA-A. We analyzed the effects of the 6 missense mutations on the property of the transmembrane domain using the PSIPRED online tool (http://bioinf.cs.ucl.ac.uk/psipred/). The results showed that although 6 missense (3 in the transmembrane domain) mutations were produced as a result of interlocus recombination between HLA-A and HLA-H, these mutations did not lead to destructive effect on the helix structure of the transmembrane domain (Fig. 3).
New allele nomenclature
The nucleotide sequence of the new allele had been submitted to the DNA Data Bank of Japan (Accession No. LC474859) and to the IPD-IMGT/HLA Database [5] (Submission No. HWS10054755). The name HLA-A*11:335 was officially assigned by the WHO Nomenclature Committee in May 2019. This follows the agreed policy that, subject to the conditions stated in the most recent Nomenclature Report [9], names will be assigned to new sequences as they are identified. The lists of these new names will be published in the next WHO Nomenclature Report.
A number of authors [10, 11] proposed that interlocus recombination or gene conversion is an important mechanism in the maintenance of MHC polymorphism. A large portion of allelic variation in MHC loci is caused by variations in the antigen recognition site of exons 2 and 3. Furthermore, recombination or gene conversion cannot explain the high rate of nonsynonymous nucleotide substitution in comparison to the rate of synonymous nucleotide substitution. As suggested by [12], the extremely high level of polymorphism at MHC loci (80-90% heterozygosity) seems to be mainly due to over-dominant selection.
Based on our latest data, 191 alleles of the A locus were identified and A*11 was common (frequency: 23.203%) in Chinese volunteers [13]. In this sample, the HLA-A was analyzed with different reagents and methods (SSO, Sanger, NGS and nanopore sequencing); however, the HLA-A of this sample could not phased according to the latest database 3.35. A novel allele was suspected. Thanks to nanopore sequencing, the exact sequence of the novel allele (A*11:335) was determined.
The sequence of region 1824 – 2437 was then blasted in the IMGT/HLA database and the results showed that 612/612 (100%) was an exact match with HLA-H*02:07/14/18 (Table 2). It was suggested that the sample contained a new A*11 allele, which was the result of interlocus genomic exchange of HLA-A and HLA-H (see Supplementary File). HLA-H is located between HLA-A and HLA-G, which are separated by less than 300 kb in the class I region of the MHC [14]. Putative protein prediction from the novel HLA-H alleles ranged from 18 amino-acid (AA) to 362 AA. Specific patterns of transmembrane HLA protein were found in two alleles: HLA-H*02:07 and HLA-H*02:14 (peptide signal, noncytoplasmic domain, transmembrane domain, cytoplasmic domain, glycosylation site and a disulfide bond). The other 23 alleles lacked all or part of these critical domains and/or sites[15]. Gene conversion among loci is considered to be an important method for creating new HLA alleles [16]. Because HLA-A and HLA-H are closely related, as well as in close proximity, it is possible that HLA-A would enhance its diversity through gene conversion with HLA-H. Although Grimsley et al. suggested that the polymorphisms in HLA-H are not the result of interlocus gene conversion with HLA-A [17], our findings indicated that the polymorphisms in HLA-A may be partially due to interlocus gene conversion with HLA-H. The mechanism underlying the recombination between the two HLA loci is unknown.
The sequence of A*11:335 differs from HLA-A*11:01:01:01 by 10 nucleotide substitutions, resulting in 3 synonymous mutations and 6 missense mutations in exon 5 (Table 3). Exon 5 encodes the transmembrane domain of HLA-A. The effects of the 6 missense mutations on the property of the transmembrane domain were analyzed using the PSIPRED online tool. The result showed that although 6 missense (3 in the transmembrane domain) mutations were produced due to interlocus recombination between HLA-A and HLA-H, these mutations did not lead to a destructive effect on the helix structure of the transmembrane domain (Fig. 3). The mechanism and the consequence of this interlocus recombination remain largely unknown. The nanopore sequencing method can read long sequences (N50>30kb) on one molecular DNA, while the read lengths of Sanger and NGS are limited (<1000 bp), which restrain their ability to clearly distinguish the sequence of a single allele.
A novel HLA-A*11:335 allele, as an interlocus recombination involving the HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles, was identified in a volunteer from the China Marrow Donor Program. The results indicated that nanopore sequencing may be the most precise method for HLA typing.
HLA: Human leucocyte antigen; NGS: Next generation sequencing; MHC: Major histocompatibility complex; SSP: Sequence-specific primer; SSO: Sequence-specific oligonucleotide; DNA: Deoxy-ribo nucleic acid; ONT: Oxford nanopore technology; RCHSBT: Reliable, cost-effective and high-throughput sequence based typing; MiniON: Mini oxford nanopore; IMGT: International ImMunoGeneTics; SNPs: Single nucleotide polymorphisms; kb: Kilobases; PSIPRED: Predict Secondary Structure; IPD: Immuno Polymorphism Database; WHO: World Health Organization
Ethics approval and consent to participate
The use of clinical samples described in this study are covered by the institutional review board of BGI located in Shenzhen, China (BGI-S022-T1). Written informed consent was obtained from subjects before blood samples were taken.
Consent for publication
Not applicable as the manuscript does not have any individual identifiable patient data.
Competing interests
The authors declare no conflicts of interest in association with the present study.
Funding
This work was supported by Beijing Hospital Project (#2019-186), CAMS Innovation Fund for Medical Sciences (2018-I2M-1-002) and National Key R&D Program of China (2018YFC2000300). We thank the China Marrow Donor Program (CMDP) for providing samples and data support.
Authors' contributions
LZ: Writing the original draft and HLA typing. ER: Sequencing Data analysis. DW & XL: Validation. JC: Project administration. All authors have read and approved the manuscript.
Acknowledgements
Not applicable.
Availability of data and material
The sequences of the novel allele described in this manuscript have been previously uploaded to the DNA Data Bank of Japan (Accession No. LC474859) and to the IPD-IMGT/HLA Database [5] (Submission No. HWS10054755).
Table 1. The typing results of sample 17ZZ2298 for HLA-A, B, C, DRB1 and DQB1 with different types of reagents and methods.
Reagents |
Methods |
HLA-A |
HLA-B |
HLA-DRB1 |
HLA-C |
HLA-DQB1 |
|||||
BGI |
Hiseq |
01:01 |
11:01 |
15:32 |
57:01 |
07:01 |
12:01 |
06:02 |
12:03 |
03:01 |
03:03 |
CSTB |
Sanger1 |
01:01 |
11:126 |
15:32 |
57:01 |
07:01 |
12:01 |
06:02 |
12:03 |
03:01 |
03:03 |
Biocapital |
Sanger2 |
01:01 |
11:126 |
|
|
|
|
|
|
|
|
Onelambda |
SSO |
01:01 |
11:01 |
|
|
|
|
|
|
|
|
Miseq1 |
01:01 |
11:126 |
|
|
|
|
|
|
|
|
|
GenDx |
Sanger3 |
01:01 |
11:126 |
|
|
|
|
|
|
|
|
Miseq2 |
01:01 |
11:335 |
final nomenclature |
||||||||
ONT |
MinION |
Table 2. The blast results of the 612 bp fragment in the IMGT/HLA database.
Align. |
DB:ID |
Source |
Length |
Score (Bits) |
Identities % |
Positives % |
1 |
H*02:18 3144 bp |
3144 |
1213.7 |
100 |
100 |
|
2 |
H*02:14 3502 bp |
3502 |
1213.7 |
100 |
100 |
|
3 |
H*02:07 3502 bp |
3502 |
1213.7 |
100 |
100 |
|
4 |
A*11:335 3071 bp |
3071 |
1213.7 |
100 |
100 |
|
5 |
H*02:01:02 3142 bp |
3142 |
1189.9 |
99.5 |
99.5 |
|
6 |
HLA-H*01:01:01:01 3498 bp |
3498 |
1182 |
99.3 |
99.3 |
|
7 |
H*01:08 3142 bp |
3142 |
1182 |
99.3 |
99.3 |
|
8 |
H*01:05 3498 bp |
3498 |
1182 |
99.3 |
99.3 |
|
9 |
H*01:04 3497 bp |
3497 |
1182 |
99.3 |
99.3 |
|
10 |
H*01:01:03 3143 bp |
3143 |
1182 |
99.3 |
99.3 |
|
11 |
H*01:01:02 3497 bp |
3497 |
1182 |
99.3 |
99.3 |
|
12 |
H*01:01:01:05 3498 bp |
3498 |
1182 |
99.3 |
99.3 |
|
13 |
H*01:01:01:04 3498 bp |
3498 |
1182 |
99.3 |
99.3 |
|
14 |
H*01:01:01:01 3498 bp |
3498 |
1182 |
99.3 |
99.3 |
|
15 |
H*02:24 3146 bp |
3146 |
1174.1 |
99.2 |
99.2 |
|
16 |
H*02:23 3142 bp |
3142 |
1174.1 |
99.2 |
99.2 |
|
17 |
H*02:20 3150 bp |
3150 |
1174.1 |
99.2 |
99.2 |
|
18 |
H*02:17 3141 bp |
3141 |
1174.1 |
99.2 |
99.2 |
|
19 |
H*02:16 3141 bp |
3141 |
1174.1 |
99.2 |
99.2 |
|
20 |
H*02:15 3150 bp |
3150 |
1174.1 |
99.2 |
99.2 |
|
21 |
H*02:13 3510 bp |
3510 |
1174.1 |
99.2 |
99.2 |
|
22 |
H*02:12 3500 bp |
3500 |
1174.1 |
99.2 |
99.2 |
|
23 |
H*02:11 3496 bp |
3496 |
1174.1 |
99.2 |
99.2 |
|
24 |
H*02:10:01:02 3511 bp |
3511 |
1174.1 |
99.2 |
99.2 |
|
25 |
H*02:10:01:01 3511 bp |
3511 |
1174.1 |
99.2 |
99.2 |
|
26 |
H*02:09 3500 bp |
3500 |
1174.1 |
99.2 |
99.2 |
|
27 |
H*02:08:01:02 3140 bp |
3140 |
1174.1 |
99.2 |
99.2 |
|
28 |
H*02:08:01:01 3500 bp |
3500 |
1174.1 |
99.2 |
99.2 |
|
29 |
H*02:05:01:04 3146 bp |
3146 |
1174.1 |
99.2 |
99.2 |
|
30 |
H*02:05:01:02 3146 bp |
3146 |
1174.1 |
99.2 |
99.2 |
|
31 |
H*02:05:01:01 3502 bp |
3502 |
1174.1 |
99.2 |
99.2 |
|
32 |
H*02:04:02 3150 bp |
3150 |
1174.1 |
99.2 |
99.2 |
|
33 |
H*02:04:01 3510 bp |
3510 |
1174.1 |
99.2 |
99.2 |
|
34 |
H*02:01:01:03 3142 bp |
3142 |
1174.1 |
99.2 |
99.2 |
|
35 |
H*02:01:01:02 3489 bp |
3489 |
1174.1 |
99.2 |
99.2 |
|
36 |
H*02:01:01:01 3496 bp |
3496 |
1174.1 |
99.2 |
99.2 |
|
37 |
H*01:07 3143 bp |
3143 |
1174.1 |
99.2 |
99.2 |
|
38 |
H*01:06:01:02 3142 bp |
3142 |
1174.1 |
99.2 |
99.2 |
|
39 |
H*01:06:01:01 3142 bp |
3142 |
1174.1 |
99.2 |
99.2 |
|
40 |
H*01:03:01:03 3142 bp |
3142 |
1174.1 |
99.2 |
99.2 |
|
41 |
H*01:03:01:02 3498 bp |
3498 |
1174.1 |
99.2 |
99.2 |
|
42 |
H*01:03:01:01 3498 bp |
3498 |
1174.1 |
99.2 |
99.2 |
|
43 |
H*01:02:01:04 3141 bp |
3141 |
1174.1 |
99.2 |
99.2 |
|
44 |
H*01:02:01:02 3497 bp |
3497 |
1174.1 |
99.2 |
99.2 |
|
45 |
H*01:02:01:01 3375 bp |
3375 |
1174.1 |
99.2 |
99.2 |
|
46 |
H*01:01:01:03 3492 bp |
3492 |
1174.1 |
99.2 |
99.2 |
|
47 |
H*02:25:01:01 3146 bp |
3146 |
1166.1 |
99 |
99 |
|
48 |
H*02:05:01:03 3146 bp |
3146 |
1166.1 |
99 |
99 |
|
49 |
H*02:25:01:02 3146 bp |
3146 |
1158.2 |
98.9 |
98.9 |
|
50 |
H*02:19:01:02 3136 bp |
3136 |
1158.2 |
98.9 |
98.9 |
Table 3. Details of the base mutation and amino acid changes in exon 5 in the novel allele HLA-A*11:335 in comparison to HLA-A*11:01:01:01.
Base position |
899 |
900 |
916 |
934 |
951 |
956 |
964 |
987 |
990 |
1001 |
A*11:01:01:01 |
T |
G |
A |
A |
C |
G |
A |
C |
G |
G |
A*11:335 |
C |
A |
G |
G |
A |
T |
G |
T |
A |
A |
Codon change |
CTG>CCA |
ATC>GTC |
ATT>GTT |
CTC>CTA |
GGA>GTA |
ATC>GTC |
GCC>GCT |
GTG>GTA |
AGG>AAG |
|
AA. substitution |
Leu>Pro |
Ile>Val |
Ile>Val |
Leu=Leu |
Gly>Val |
Ile>Val |
Ala=Ala |
Val=Val |
Arg>Lys |
|
Codon position |
276 |
282 |
288 |
293 |
295 |
298 |
305 |
306 |
310 |