Identication of the Novel HLA-A*11:335 Allele, a Rare Interlocus Recombination Involving HLA-A*11:01:01:01 and HLA-H*02:07/14/18 Alleles With Nanopore Sequencing, in a Volunteer From the China Marrow Donor Program

Background: The major histocompatibility complex in humans includes three classical class I loci (A, B and C), which are important biomarkers for transplant of organs and hematopoietic stem cells. In the MHC, polymorphism is known to be extremely high while interlocus recombination is rare. We report a rare interlocus recombination between HLA-A and HLA-H, which was analyzed using next generation sequencing and nanopore sequencing. Results: In the sample, the genotypes of HLA-A, B, C, DRB1 and DQB1 were rstly phased with methods of sequence-specic primer, sequence-specic oligonucleotide, Sanger’s sequencing and NGS; however, HLA-A could not be phased. Nanopore sequencing was nally utilized to distinguish the sequence of the novel allele. Finally, the novel HLA-A*11:335 allele was identied as an interlocus recombination involving HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles; this was mainly achieved by nanopore sequencing. Conclusions: The identication of the interlocus recombination indicated that nanopore sequencing may be the most precise method for HLA typing. Interlocus recombination has been identied as one of the mechanisms involved in the generation of novel HLA alleles. Onelambda, The with the SSO A*01:01/11:01, which the as with the RCHSBT The was further typed using Miseq based sequencing (Onelambda) and the assignment was A*01:01/11:126, with the following system comments: “Warning: mismatch in an intron, two or more variants cannot be phased, Locus has a high background position in exon.” The sample was further typed by next generation sequencing (NGS) using commercially available reagents (GenDX, Utrecht, The Netherlands) and a MiniSeq system (Illumina, San Diego, California, USA), and MinION based nanopore sequencing (ONT, Oxford, UK). Data were analyzed using the NGSengine software program (GenDX). involving the HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles, was identied in a volunteer from the China Marrow Donor Program. The results indicated that nanopore sequencing may be the most precise method for HLA typing.


Background
The human classical class I loci, which include HLA-A, -B, and -C, encode MHC molecules, which are expressed on all nucleated cells. Their main function is to present intracellularly processed foreign peptides to cytotoxic T cells [1]. Classical and nonclassical class I genes have a similar structural organization. Exon 1 encodes the leader peptide, and exons 2, 3 and 4 encode the three extracellular domains, α1 α2 and α3, respectively. The 57 residues of the antigen recognition site are located in the α1 and α2 domains [2,3]. Exon 5 encodes the transmembrane portion of the molecule, and exons 6, 7 and 8 encode the cytoplasmic tail. Several mechanisms have played a role in the generation of HLA polymorphism, including crossing over, gene conversion and point mutations [4]. Point mutations can occur as single nucleotide substitutions, which may produce synonymous or non-synonymous changes. The rate of non-synonymous changes greatly exceeds the rate of synonymous changes within the antigen binding domain, suggesting a selection-driven mechanism [5]. Besides point mutation, recombination between homologous chromosomes has been identi ed as a mechanism involved in the generation of a number of HLA alleles. Recombination can involve genes of the same locus or different loci, producing intralocus or interlocus genomic exchanges [6]. We herein report a novel HLA-A*11 allele, A*11:335, which was identi ed as an interlocus recombination involving the HLA-A*11:01:01:01 and HLA-H*02:07/14/18 alleles in a Chinese bone marrow donor and analyze the consequence of this recombination. This interlocus recombination was mainly identi ed by nanopore sequencing.

Methods
Sample origination and rst HLA typing A total of 2964 specimens were sampled (approximately 2%) from the database of recruited volunteers of the China Marrow Donor Program in 2017 and secondly typed for HLA-A, B, C, DRB1 and DQB1. The DNA sample 17ZZ2298 was originally extracted from a volunteer's peripheral blood by the BGI laboratory, which was a cooperating partner of the China Marrow Donor Program. HLA typing for A, B, C, DRB1 and DQB1 was rstly performed with the BGI Next Generation Sequencing Typing method-RCHSBT [7] (BGI, Shenzhen, China). HLA typing con rmation HLA typing of the sample 17ZZ2298 was performed for a second time with Sanger's sequencing method (Shenzhen Tissue Bank Precision Medicine Co., Ltd., China) to examine the results acquired from RCHSBT. Because the result of HLA-A typing was A*01:01/11:126, which was different from that obtained by RCHSBT (A*01:01/11:01), the sample was typed for a third time using the SSO method (Luminex 3D, Onelambda, USA). The result obtained with the SSO method was A*01:01/11:01, which was the same as that obtained with the RCHSBT method. The sample was further typed using Miseq based sequencing (Onelambda) and the assignment was A*01:01/11:126, with the following system comments: "Warning: mismatch in an intron, two or more variants cannot be phased, Locus has a high background position in exon." The sample was further typed by next generation sequencing (NGS) using commercially available reagents (GenDX, Utrecht, The Netherlands) and a MiniSeq system (Illumina, San Diego, California, USA), and MinION based nanopore sequencing (ONT, Oxford, UK). Data were analyzed using the NGSengine software program (GenDX).

Sequence blasting
The sequence of the novel allele (mismatched area) was blasted in the IMGT/HLA database using the "BlastN" tool.

Transmembrane property analysis
The effects of the 6 missense mutations in exon 5 on the function of the transmembrane domain was analyzed and predicted with the PSIPRED online tool (http://bioinf.cs.ucl.ac.uk/psipred/). The amino acid sequences of exon 5 of HLA-A*11:335 and HLA-A*11:01:01:01 were entered into the online tool and analyzed.
The sample was then further analyzed by Miseq-based typing (GenDx). The data showed that there was a new allele, but exon 3 and exon 4 could not be phased with the MiSeq data. Therefore, the Miseq reads were further analyzed together with a low number of MiniON reads. The recommended genotype was HLA-A*01:01:01:01/A*11:126 (Fig. 1B); however, there were numerous mismatches between exon 4 and exon 6. All mismatches (indicated by blue or red triangles) were in the HLA-A*11 allele, and were located between the last heterozygous position in exon 4 (gDNA 1824) and intron 5 (gDNA 2437), which was heterozygous in this sample. The bases found at gDNA 1824 matched with the two reported HLA-A alleles. The rst mismatched position in intron 4 (gDNA 1887) was heterozygous AC in this sample. All known HLA-A alleles have an A at this position. Thanks to the phasing information, we found that the C belongs to HLA-A*11new. The last two heterozygous positions (gDNA 2431 and gDNA 2437) have A-A in one allele and G-T in the other allele. A-A occurs in many HLA-A alleles while G-T is not present in any HLA-A alleles. When region 1887 -2437 (matching with HLA-A*11:126) or region 1824 -2437 (matching with HLA-A*11:01:01:01) were excluded, the data were an exact match with HLA-A. The typing results of HLA-A with each reagent and the nal nomenclature are listed in Table 1.

Sequence blast and mutation analysis
The sequence of region 1824 -2437 (612bp, because of an "AT" deletion in intron 5) was then blasted in the IMGT/HLA database. The results showed that 612/612 (100%) was an exact match with HLA-H*02:07/14/18, as shown in Table 2  Transmembrane property analysis As shown in Table 3, the sequence of HLA-A*11:335 differs from HLA-A*11:01:01:01 by 10 nucleotide substitutions, which resulted in 3 synonymous mutations and 6 missense mutations, mainly in exon 5. Exon 5 encodes the transmembrane domain of HLA-A. We analyzed the effects of the 6 missense mutations on the property of the transmembrane domain using the PSIPRED online tool (http://bioinf.cs.ucl.ac.uk/psipred/). The results showed that although 6 missense (3 in the transmembrane domain) mutations were produced as a result of interlocus recombination between HLA-A and HLA-H, these mutations did not lead to destructive effect on the helix structure of the transmembrane domain (Fig. 3).

New allele nomenclature
The nucleotide sequence of the new allele had been submitted to the DNA Data Bank of Japan (Accession No. LC474859) and to the IPD-IMGT/HLA Database [5] (Submission No. HWS10054755). The name HLA-A*11:335 was o cially assigned by the WHO Nomenclature Committee in May 2019. This follows the agreed policy that, subject to the conditions stated in the most recent Nomenclature Report [9], names will be assigned to new sequences as they are identi ed. The lists of these new names will be published in the next WHO Nomenclature Report.

Discussion
A number of authors [10,11] proposed that interlocus recombination or gene conversion is an important mechanism in the maintenance of MHC polymorphism. A large portion of allelic variation in MHC loci is caused by variations in the antigen recognition site of exons 2 and 3. Furthermore, recombination or gene conversion cannot explain the high rate of nonsynonymous nucleotide substitution in comparison to the rate of synonymous nucleotide substitution. As suggested by [12], the extremely high level of polymorphism at MHC loci (80-90% heterozygosity) seems to be mainly due to over-dominant selection.
Based on our latest data, 191 alleles of the A locus were identi ed and A*11 was common (frequency: 23.203%) in Chinese volunteers [13]. In this sample, the HLA-A was analyzed with different reagents and methods (SSO, Sanger, NGS and nanopore sequencing); however, the HLA-A of this sample could not phased according to the latest database 3.35. A novel allele was suspected. Thanks to nanopore sequencing, the exact sequence of the novel allele (A*11:335) was determined.
The sequence of region 1824 -2437 was then blasted in the IMGT/HLA database and the results showed that 612/612 (100%) was an exact match with HLA-H*02:07/14/18 (Table 2). It was suggested that the sample contained a new A*11 allele, which was the result of interlocus genomic exchange of HLA-A and HLA-H (see Supplementary File). HLA-H is located between HLA-A and HLA-G, which are separated by less than 300 kb in the class I region of the MHC [14]. Putative protein prediction from the novel HLA-H alleles ranged from The sequence of A*11:335 differs from HLA-A*11:01:01:01 by 10 nucleotide substitutions, resulting in 3 synonymous mutations and 6 missense mutations in exon 5 (Table 3). Exon 5 encodes the transmembrane domain of HLA-A. The effects of the 6 missense mutations on the property of the transmembrane domain were analyzed using the PSIPRED online tool. The result showed that although 6 missense (3 in the transmembrane domain) mutations were produced due to interlocus recombination between HLA-A and HLA-H, these mutations did not lead to a destructive effect on the helix structure of the transmembrane domain (Fig. 3). The mechanism and the consequence of this interlocus recombination remain largely unknown. The nanopore sequencing method can read long sequences (N50>30kb) on one molecular DNA, while the read lengths of Sanger and NGS are limited (<1000 bp), which restrain their ability to clearly distinguish the sequence of a single allele. The use of clinical samples described in this study are covered by the institutional review board of BGI located in Shenzhen, China (BGI-S022-T1). Written informed consent was obtained from subjects before blood samples were taken.

Consent for publication
Not applicable as the manuscript does not have any individual identi able patient data.

Competing interests
The authors declare no con icts of interest in association with the present study.