Genome-Wide Association Study of Actinidia Chinensis

Background: A total of 74,936 SNPs were employed to carry out GWAS and post-GWAS of the fty-six accessions, representing the two most valuable varieties of Actinidia chinensis, namely, A. chinensis var. chinensis and A. chinensis var. deliciosa, are mainly distributed in China. The percentage of heterozygous sites of A. chinensis var. deliciosa is higher than that of A. chinensis var. chinensis, which could be one of the reasons for A. chinensis var. deliciosa high disease resistance. LD decay distance of male plants is shorter than that of female plants. Results: Fifty-six accessions were divided into two subgroups. Analysis of molecular variance shows that the frequency of genetic variations within the population is 83.53% and 16.47% between populations. Fst between the two populations is 0.14, and Nm is 1.60. Set at α ≤ 0.05, a total of 327 SNPs and 260 haplotypes were related to the hispidus. A total of 246 proteins were annotated using GO and KEGG analyses, which indicated the membrane-related genes and stress-resistant metabolic pathways are related to the hispidus condition of leaves, stems, and peels of kiwifruit, which is also the result of the adaptation of A. chinensis var. deliciosa to its growth environment. Conclusions: Haplotype analysis showed that the evolution of is A. chinensis var. deliciosa later than that of A. chinensis var. chinensis.

related research, particularly in terms of a multigene pyramiding breeding technology centering on molecular marker-assisted selection that can select genotypes of quality production traits at the DNA level. Such a method circumvents the disadvantage of low accuracy of conventional breeding technology improves breeding e ciency.

Mapping rate and heterozygous proportion
Kiwifruit is a functional dioecious plant. Currently, only the genome of the female plant, named, red 5, has been released in 2018 (ncbi.nlm.nih.gov/genome/genomes/16401). Therefore, the sequence of kiwifruit can only be aligned with red 5. The mapping rate (number of reads on the reference genome divided by the number of valid sequencing data) and the heterozygous proportion can roughly re ect the classi cation of the sample. The mapping rate of 56 kiwifruits ranged from 68.08% to 97.30% (average: 90.49%). Among these, the mapping rate of the A. chinensis var. chinensis increased from 91.31% to 97.30%, with an average of 95.57%. The mapping rate of A. chinensis var. deliciosa increased from 68.08% to 89.85%, with an average of 87.96%, which was lower than that of A. chinensis var. chinensis. The mapping rates of the two plants that were assessed morphologically observations, A. polygama, were 89.94% and 90.51%, respectively. A. polygama may be an intermediate type between A. chinensis var. chinensis and A. chinensis var. deliciosa. The average mapping rate of the female plant is 91.60%, and the mapping rate of the male plant is 89.39%.Therefore, the difference at the genome level between male and female plant is less than that between subspecies of A. chinensis.
The heterozygous proportion ranged from 45.02% to 12.03%. The mapping rate was negatively correlated with the heterozygous proportion. A. chinensis var. chinensis showed a high mapping rate and a low heterozygous promotion. However, A. chinensis var. deliciosa showed a low mapping rate and a high heterozygous proportion (Fig. 1).

LD and haplotype blocks analysis
Approximately 74,936 SNPs were unevenly distributed among 29 chromosomes (Fig.2a); chromosome 23 contained the highest number of markers (4,645), whereas chromosome 17 had the lowest (1,824). SNPs are infrequently inherited independently and thus have a propensity to be inherited in clusters, thereby resulting in linkage disequilibrium (LD) as well as haplotypes. Haplotype blocks are expected in A. chinensis genomes to be non-randomly distributed (Fig.2b). The 29 chromosomes yielded a total of 4,379 predicted haplotype blocks; chromosome 23 possessed the highest number of haplotype blocks (303), whereas chromosome 17 had the least (108). The largest haplotype blocks were composed of 18 SNPs, and the average number of SNPs within all haplotype blocks was 3.
Whether at a single chromosome (Fig. 3a) or genome-wide level (Fig. 3b), LD shows that as the physical distance increases, the smaller becomes r 2 between two SNP sites or even null. For different samples, LD also decreases with increasing distance between the two SNP sites (Table 1). LD in the male plant had signi cantly higher decay than that of the female plant, and the LD of A. chinensis var. deliciosa had signi cantly higher decay than A. chinensis var. chinensis. The longest haplotype blocks span 194.74 kb, the average length on haplotype blocks was 3.07 kb. A. chinensis var. chinensis is shorter than A. chinensis var. deliciosa in terms of average span of haplotypes, this shows that the evolution of A. chinensis var. chinensis is later than that of A. chinensis var. deliciosa.

Population structure
Clustering of population structure was performed by principal component analysis (PCA), the unweighted pair-group method with arithmetic means (UPGMA), and Bayesian clustering algorithm.

PC analysis
In PCA, the rst three principal components accounted for 18.45%, 11.35%, and 5.02% of the total variation, respectively. The results of clustering based on the rst three PCs were initially separated by varieties and intermediate types of material. A. polygama (No. 12 and 38) and A. chinensis var. chinensis were clustered together, and the yellow pulp kiwifruit (No. 47), a variant of A. chinensis var. chinensis, was separated from the other samples (Fig. 4).

UPGMA
The phylogenetic tree can be divided into two subgroups consisting of 45 (subgroup I) and 11 (subgroup II) accessions. The results of primary clustering and morphological observations are not completely consistent (red represents A. chinensis var. chinensis, green represents A. chinensis var. deliciosa, and yellow-green represents A. polygama) (Fig. 5). In subgroup I, the 45 accessions, including A. chinensis var. Chinensis and A. chinensis var. deliciosa, can be divided into several sub-classes at different similarity coe cients levels, in which 20 accessions that were genetically related were gathered together [including two A. polygama plants and one yellow pulp kiwifruit (asterisks)]. In subgroup II, all accessions belong to A. chinensis var. deliciosa. Based on morphological observations, those may have occurred earlier in evolution than A. chinensis var. deliciosa in subgroup I.

Bayesian clustering
We set K from 1 to 5 using previous population information. Delta K reached a maximum value at K=2 (Fig. 6a), suggesting that the 56 accessions were divided into two subgroups (consisting of 19 and 37 accessions) (Fig. 6b). In particular, the results of No. 47 sample are inconsistent among the three clustering methods.In the population structure analysis, the results from K = 2 to K = 5 revealed the occurrence of gene introgression between A. chinensis var. Chinensis and A. chinensis var. deliciosa, accounting for approximately 57.14% of the observed variations (calculated by K = 2).

Introgression analysis
The present study determined that gene ow (Nm) was 1.60. The Nm in animal-pollinated plants is 0.801 and 5.380 in wind-pollinated plants (Chen 1996). Kiwifruit is a typical insect-pollinated species under natural conditions. However, arti cial pollination is often used in cultivation to improve quality and production. If a certain region of the genome is introduced into a population by gene ow, then it will continuously exist in the population, indicating that the region may be subject to positive selection or balanced selection; the degree of LD in these regions increases, and the length of the shared haplotype will be shorter. The average haplotype length of the 56 samples is 3.07 kb. The average proportion identity-by-descent (IBD) was 0.49, which ranged from 0.17 to 0.99. Otherwise, this area will be removed from this population by negative selection or genetic drift. Gene ow within A. chinensis species is one of the important ways to meet the market demand for this crop and to adapt to the environment.
Analysis of molecular variance AMOVA indicated that 16.47% of the total variation was due to differences among populations, and 83.53% of the variation was due to a difference within populations ( Table 2).

Genotype-phenotype association analysis
No value can be correctly associated with soft or hard hispidus as well as to more or less hispidus on the leaves, stems, and peels of kiwifruit, but a phenotypic value of 1 or 2 was assigned to the hispidus associated study between A. chinensis var. deliciosa and A. chinensis var. chinensis. The threshold of association test was set at α ≤ 0.05, that is, P ≤ 0.05 /74936= 6.67E-7. a) Single SNP association test 327 loci were associated with hispidus condition (Table S1). The strongest signal came from the SNP at position 15,347,013 on Chr. 16, followed by those from markers on Chrs. 8 and 16 (Fig.7). b) Haplotype association test 260 haplotype blocks were associated with hispidus condition. One signal from Chr. 16 represented the most signi cant association. Followed by those from markers on Chrs. 8 and 16. (Table S2).The results of gene mapping for hispidus in Actinidia chinensis based on haplotype association test are shown in Figure 8.

GO-enrichment and KEGG pathway
GO function analysis showed ( Fig.9) that differential genes related to cell components signi cantly enriched in integral components of the membrane, nucleus, membrane, and chloroplast. Differential genes involved in molecular functions signi cantly enriched the categories of ATP binding, metal ion binding, DNA binding, and protein kinase activity.
To further understand the biological pathways in which the differential genes are mainly involved, KEGG pathway enrichment analysis of the identi ed related genes was performed (Fig.10). The results showed that the related genes belong to 105 metabolic pathways. The rst three types of pathways include metabolic, biosynthesis of secondary metabolites, and biosynthesis of antibiotics.

Discussion
In the past few decades, SSR technology has been widely used in related elds of bioscience. Since the rst report of the application of genome-wide association study based on SNPs named "complement factor H polymorphism in age-related macular degeneration" (Robert 2005). with the development of next-generation sequencing technology, SNPs has many advantages in the elds of genotyping, phyletic evolution, diversity, linkage mapping, QTL mapping and so on. In order to improve the cost-effectiveness of sequencing, the simpli ed genomic sequencing technology based on the restriction site-associated DNA sequencing (RAD tags) has been widely used (zhang 2020a). Among these, at present, GBS technology has become a main technology to simplify sequencing for simple process and using methylation sensitive enzymes avoiding the genomic repeat region. It is especially suitable for large sample size research, which can quickly identify high-density mutations and accelerate biological research.
GWAS is based on population structure analysis. Many studies have shown that complex population structure will lead to false positive between genotype and phenotype ( Price 2010 Haplotype analysis has been widely used in the genetic evolution of biological resources (Wang 2017;Xu 2018). Haplotype is likely to play a role in the process of genetic evolution as a functional unit.In this study, haplotype analysis showed that the evolution of A. chinensis var. chinensis is later than that of A. chinensis var. deliciosa, which is different from Huang (Huang 2013). The loss of the hispidus of A. chinensis var. Chinensis may be the result of low altitude cultivation, which also leads to the decrease of resistance to low temperature and disease.Haplotypes also play an important role in genes mapping related to complex diseases as well as important traits in animals and plants, and provide important systematic information for selection, matching and breeding of quantitative traits. In the associated analysis based on single SNP test and haplotype test, the peak SNP is often appeared within haplotype. According to the previous haplotype test of genome-wide association study, we can see that the haplotype implies gene clusters related to traits (Zhang 2020b).
Statistical analysis of association may lead to some wrong decisions when strategies that reduce data analysis or scienti c reasoning to mechanical standards are used alone; for example, α < 0.05. So far, no report has been found about the combination of GWAS and post -GWAS analysis. It is necessary to perform post-GWAS such as GO and KEGG to better understand the pathways related to the development of traits. Thus, we merged the results of the SNP and the haplotype tests, a total of 327 SNPs and 260 haplotypes were related to the hispidus. A total of 246 proteins were annotated using GO, the related genes are mainly enriched in molecular functions and cell components, but less in biological processes. KEGG analyses showed that the membrane-related genes and stress-resistant metabolic pathways are related to the hispidus condition of leaves, stems, and peels of kiwifruit, which is also the result of the adaptation of A. chinensis var. deliciosa to its growth environment. Meanwhile, hispidus is simultaneously accompanied by changes in leaf color, i.e., the more hispidus on the leaves, the lighter the shade of green on the leaves. The results provide a genetic basis for the evolution of A. chinensis and the formation of hispidus.

Plant Materials
A total of 56 accessions, representing most of the A. chinensis germplasm resource of the Qinba area in China, were collected from the kiwifruit experimental farm of different regions during the 2019 growing season (Table 3) and comprised 18 accessions of A. chinensis var. chinensis, 36 accessions of A. chinensis var. deliciosa, and 2 accession of A. polygama, which were established based on morphological observations.

Genotype data
Genotyping-by-sequencing (MseI digestion) was used to obtain SNP data for this panel of 56 accessions using the Illumina HiSeq2000 sequencing platform. A total of 74,936 SNPs passed the MAF lower limit of 0.05.

LD and haplotype analysis
Genotype data were then used to calculate LD between SNPs and to construct haplotypes using the EM algorithm implemented by PLINK1.90 (https://www.cog-genomics.org/plink2). The command line "--r 2 " and "--blocks" were used to calculate LD and assign SNPs to their respective haplotypes by calculating inter-maker LD within a 200kb window, respectively. obtain an estimate of the most probable number of population (K), K values from 1 to 5 were simulated with 10 iterations for each K, using 10 0000 burn-in periods followed by 10 0000 Markov Chain Monte Carlo iterations. Delta K was plotted against K values and the best number of clusters was determined following the method proposed by Evanno et al (Evanno et al. 2005), and the population structure diagram was obtained via the Structure Harvester platform (Earl et al. 2012).

AMOVA and introgression analysis
On the basis of mapping rate and population structure analysis, for analysis of molecular variance (AMOVA), 56 accessions were classi ed into two groups, namely, A. chinensis var. chinensis and A. chinensis var. deliciosa. The components of variance attributable to different varieties were estimated from the genetic distance matrix. The Tajima & Nei method was used to estimate molecular distance.
The distance method used for locus-by-locus analysis was pairwise difference. As speci ed in the AMOVA procedure,

Association analysis
Single SNP association with the allelic association tests were run in Plink1.90, Haplotype association tests were performed with haplotype-based C/C tests using Plink1.07.

GO-enrichment and KEGG analysis
We integrated the results of the commonly detected loci/regions, including the single SNP association tests and haplotype association tests. A total of 246 genes were assigned, and functional annotation was performed using KEGG pathway analysis in KEGG Mapper (https://www.genome.jp/kegg/mapper.html).

Declarations
Ethics approval: Not applicable.
Consent to participate All authors declare that we agreed to participate in the publication.
Consent for publication: All authors declare that we agreed to publicate the paper in BMC Plant Biology.
Availability of data and material (data transparency) Con icts of interest/Competing interests: The authors declare that they have no con ict of interest.    Structure estimation of populations and Bayesian clustering. a. Structure estimation of populations for K ranging from 1 to 5 by delta K. b. Bayesian clustering based on 74936 SNPs for 56 plants. Note: Red: group I; Green: group II. Vertical lines on the X-axis refer to each sample. The proportion of each color represents probability rate with which a given genotype belongs to each group. The cluster of yellow pulp kiwifruit gives different results using various clustering methods. PA analysis revealed that yellow pulp is a distinct characteristic of kiwifruit. Prioritizing candidate haplotype GO functional enrichment analysis.

Figure 10
KEGG pathways enrichment analysis.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. 9.20kiwifruitSupplementarydata.doc