Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6,113 NRS adding up to 12.8 Mb. Besides 1,571 insertions, we detected 3,041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1,143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
Posted 30 Sep, 2019
On 06 Nov, 2019
On 17 Sep, 2019
On 10 Sep, 2019
On 29 Aug, 2019
On 28 Aug, 2019
On 28 Aug, 2019
Received 02 Aug, 2019
On 02 Aug, 2019
Received 30 Jul, 2019
On 19 Jul, 2019
Invitations sent on 18 Jul, 2019
On 18 Jul, 2019
On 17 Jul, 2019
On 04 Jul, 2019
On 03 Jul, 2019
On 02 Jul, 2019
Posted 30 Sep, 2019
On 06 Nov, 2019
On 17 Sep, 2019
On 10 Sep, 2019
On 29 Aug, 2019
On 28 Aug, 2019
On 28 Aug, 2019
Received 02 Aug, 2019
On 02 Aug, 2019
Received 30 Jul, 2019
On 19 Jul, 2019
Invitations sent on 18 Jul, 2019
On 18 Jul, 2019
On 17 Jul, 2019
On 04 Jul, 2019
On 03 Jul, 2019
On 02 Jul, 2019
Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6,113 NRS adding up to 12.8 Mb. Besides 1,571 insertions, we detected 3,041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1,143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6
This is a list of supplementary files associated with this preprint. Click to download.
Loading...