Background
Upland cotton (Gossypium hirsutum) is one crucial natural textile fiber and oilseed crop worldwide. Copy number variation (CNV) represents a form of genomic structural variation underlying phenotypic diversity. Understanding the genomic CNVs of Upland cotton is important for accelerating cotton breeding programs.Therefore,the present study generated a CNV regions (CNVRs) map in Upland cotton using the SNP array-based approach. The roles of CNVs identified were explored based on GO enrichment analysis of genes located in CNVRs.
Results
we used the Illumina Cotton SNP 70K BeadChip array for genome-wide detection of CNVs in 288 Upland cotton accessions. A total of 341 CNVRs were detected, representing approximately 2.0% of the entire cotton genome. Of these, 264 were gains, 55 were losses, 22 were events with gains and losses within the same region. The average length of CNVRs was 125.5kb and the median length was 80.4kb.The gene ontology (GO) enrichment analysis revealed that genes associated with metabolic process, catabolic process, multiple enzyme activities and transcription regulator activity were significantly overrepresented.
Conclusions
The findings that 90.3% of CNVRs harbored genes indicated the critical roles of CNVs in cotton. we identified two intriguing candidate biological process terms, polysaccharide metabolic process and catabolic process, related to cotton fiber development. Our research characterized the first landscape of CNVRs in cotton, providing an important resource for investigations of cotton genetic variability. It will facilitate further researches of molecular mechanisms underlying CNV-associated traits and improve our knowledge of genomic structure of cotton.