Genome-Wide Association Mapping of Resistance to the Sugarcane Aphid in Sorghum bicolor

Somashekhar Punnuri (  punnuris@fvsu.edu ) Agriculture Research Station, 1005 State University Dr, Fort Valley State University, Fort Valley, GA 31030 Addissu Ayele Agriculture Research Station, 1005 State University Dr, Fort Valley State University, Fort Valley, GA 31030 Karen Harris-Shultz USDA-ARS, Crop Genetics and Breeding Research Unit, 115 Coastal Way, Tifton, GA 31793 Joseph Knoll USDA-ARS, Crop Genetics and Breeding Research Unit, 115 Coastal Way, Tifton, GA 31793 Alisa Co n USDA-ARS, Southeast Watershed Research Laboratory, 2316 Rainwater Road, Tifton, GA 31793 Haile Tadesse USDA-ARS, Southeast Watershed Research Laboratory, 2316 Rainwater Road, Tifton, GA 31793 Scott Armstrong USDA-ARS, Wheat Peanut and Other Field Crops Research Unit, 1301 N. Western Road, Stillwater, OK 74075 Trahmad Wiggins Agriculture Research Station, 1005 State University Dr, Fort Valley State University, Fort Valley, GA 31030 Hanxia Li University of Georgia Institute of Bioinformatics, 120 Green Street, Athens, GA 30602 Scott Sattler USDA-ARS, Wheat, Sorghum and Forage Research Unit, Lincoln, NE 68583 Jason Wallace University of Georgia, Department of Crop & Soil Sciences, 120 Carlton Street, Athens GA 30602


Introduction
Sorghum (Sorghum bicolor (L.) Moench) is the fth most important cereal crop worldwide, and it is used for food, feed, ber, and fuel 1,2 . Farmers in the United States use sorghum mainly as livestock feed and silage while the grain is also utilized for bird feed and to produce ethanol 3 . Sweet sorghum, with large amounts of free sugars in the stem juice, can potentially be used as a next-generation biofuel feedstock for ethanol and bioenergy production 4 . The United States is the world's largest producer and lead exporter of grain sorghum, having harvested 1.9 million hectares (4.7 million acres) and having exported 6.99 million metric tons (275 million bushels) in 2020. Importers of U.S. sorghum included Mexico, China, Japan, Ethiopia, and New Zealand 5 . Grain sorghum our is also gluten-free and has potential nutritional and health bene ts 6 . Its wide agro-climatic adaptation results from its high phenotypic diversity, which enabled its expansion from the Sahel desert into temperate and subtropical regions of Asia and the Americas 7 .
Since 2013, sorghum farmers in the U.S. have suffered yield losses caused by a new invasive pest, the sugarcane aphid (SCA), Melanaphis sacchari (Zehntner) 8 . Although still unclear how this new super-clone came to the U.S. 9, 10 , it was rst discovered near Beaumont, TX on grain sorghum 11 . By the end of 2013, sugarcane aphids had colonized sorghum across four states, including Texas, Louisiana, Mississippi, and Oklahoma 8 . Sorghum in these areas was so heavily infested that combines would clog during harvest due to the sticky honeydew that aphids had secreted on the plant material 12 . Currently, sugarcane aphids have spread throughout the U.S. and are found in all areas that grow sorghum.
Assessing insect damage in a large eld experiment can be very slow and laborious, and visual ratings may be subjective. The need for highthroughput phenotyping (HTP) has spurred advances in using image processing, robotics, and drone-based methods to assess insect resistance in crop plants. Outputs from remote sensing applications provide fast and accurate forecasting of targeted insect pests 28 , pest damage [29][30][31] , and pest monitoring [32][33][34][35][36] . While satellite-borne sensors have been useful to monitor broad regional and global cropping systems, multi-and hyperspectral sensors held at closer distances (from a few centimeters to a few hundred meters) provide extremely ne resolution data associated with plant responses to stressors 34,37 , including aphid stress 38 . In these cases, sensors may be borne on lowaltitude aircraft, including unmanned aerial systems (UAS, also known as drones), mounted on xed towers, tethered on cables, or held by hand. In the last several years, UAS borne sensors have become ubiquitous in agricultural research, presenting a booming frontier for research and data collection to answer research questions surrounding crop pest and disease detection and monitoring, as well as insecticide and fungicide applications worldwide. One complication for sugarcane aphid population estimation is that the aphids feed on the abaxial (underside) leaf surface. Therefore, the direct visualization of aphids with current high-throughput platforms may not be easy, even at close range.
The Sorghum Association Panel (SAP) is a collection of accessions that represent all the major cultivated races and U.S. sorghum breeding lines 39 . Our objective was to examine the hypothesis that there may be multiple genomic regions associated with aphid resistance other than the RMES1 locus and other QTL identi ed on SBI-06. To identify new regions of the sorghum genome that confer resistance to the sugarcane aphid, we used 276 accessions from the existing SAP database plus 7 additional inbred lines with resistance to sugarcane aphid or other insects, to perform genome-wide association (GWAS) for sugarcane aphid resistance using different insect-related phenotyping traits such as the number of aphids, plant damage rating, and other physio-morphological traits. To facilitate the rapid assessment of plant damage across all accessions we used UAS multi-spectral imagery to corroborate the damage assessed through visual inspection.

Results
For the eld trials, sugarcane aphid population number, damage, plant height, and owering time were recorded for two years in 2019 and 2020. In 2019 and 2020, the rst alate aphids (winged aphids) were observed on sorghum on July 11 and July14, respectively. Aphid population increased gradually in both years before the population collapsed in the second half of August each year. Aphid counts and damage ratings were started at peak of the aphid population growth, recorded as rst ratings on August 14, 2019, and August 11, 2020, while the second ratings were taken on August 28, 2019 and August 18, 2020, respectively. The sugarcane aphid damage on the SAP panel was rated twice in each year and the ratings were averaged across both years ("D1" and "D2" for the rst and second rating dates, respectively).

General statistical summary
Aphid damage on plants was rated using a 1-9 scale, with 1 describing plants that showed no aphid-induced plant damage and 9 describing plants that had heavy aphid infestation up the ag leaf and 80% of the leaves displayed aphid damage 40 . There were 5 resistant lines that scored between 1.0-3.0 and 12 susceptible lines that scored 8.0-9.0 scale (Supplementary Table S1-a). There were two susceptible checks (N98 & N109B) and two resistant checks (No. 5-Gambela and SC110). The average aphid damage score over two seasons for the susceptible checks ranged from 2.3 to 6.3, while for the resistant checks aphid damage rating ranged from 1.0-3.2 (Supplementary Table S1-b).
Most SAP accessions were susceptible to aphid damage, with a mean damage score of 3.8 for D1 and 5.9 for D2 (Supplementary Table S1-c).
Aphid induced plant damage for D1 and D2 ranged from 1.0-7.0 and 2.5-9.0, respectively. The average aphid count over the two seasons for the rst-rating dates (AC1) ranged from 48 to 1500 aphids leaf -1 , while the average of the second-rating dates aphid count (AC2) ranged from 0 to 1166 aphids leaf -1 (see Methods and abbreviations for details). For the SAP lines grown in the greenhouse assessed for aphid-induced plant damage (GHD) at the seedling stage, the average damage rating was 8.5 and the range varied from 5.0-9.0.
Remotely sensed imagery from a UAS-borne sensor was used to calculate three vegetation re ectance indices, namely, Normalized Difference Red-Edge (NDRE), Normalized Difference Vegetation Index (NDVI), and Soil Adjusted Vegetation Index (SAVI). Wall-to-wall coverage of the index values was used to score aphid damage more accurately across the entire eld. Only the UAS-based data collected on August 25, 2020, was used for analysis as it had a high correlation to manual phenotyping and high heritability values. The NDRE value ranged from 0.12 (high aphid damage) to 0.51 (low or no aphid damage) with a mean of 0.28. The NDVI scores ranged from 0.16 to 0.78 with a mean of 0.46, while the SAVI score ranged from 0.07 to 0.41 with a mean of 0.16 (Supplementary Table S1-c).
The average plant height for the two growing seasons was 106 cm, ranging from 39 to 285 cm (Supplementary Table S1-c). Flowering time ranged from 41 to 97 days after planting, with an average of 59 days after planting.

Correlation analysis
The correlation between re ectance indices from 2020 and average aphid damage ratings from 2019 and 2020 were calculated. A signi cant and negative linear relationship was observed between aphid damage for the rst-rating date in 2020 and UAS based re ectance indices with NDRE (r = -0.41), NDVI (r = -0.38), and SAVI (r = -0.35) (Supplementary Table S1-d). Similarly, the second-rating date for visual aphid damage in 2020 also showed a relatively strong negative correlation with NDVI (r = -0.48), SAVI (r = -0.50), and NDRE (r = -0.52). The average rst-rating dates (D1) for aphid damage for 2019 and 2020 were also negatively correlated with NDVI (r = -0.43), SAVI (r = -0.41), and NDRE (r = -0.44). A relatively strong negative linear relationship was observed between the second-rating dates (D2) for aphid damage for 2019 and 2020 with NDVI (r = -0.55), SAVI (r = -0.56), and NDRE (r = -0.58). The correlation between re ectance indices and the rst average aphids count (AC1) ratings showed a negative linear relationship, whereas the second aphid count ratings (AC2) showed a positive relationship. A relatively strong positive linear relationship was observed between AC1 and D1 (r = 0.62), as well as AC1 and D2 (r = 0.51) (Supplementary Table S1 Genome-wide association was performed using two complementary methods. First, we performed single-marker regression with a generalized linear model (GLM), using genetic principal coordinates to correct for population structure in TASSEL 41 . Signi cant hits were identi ed by using 1000 permutations to establish empirical p-values, with empirical p≤0.01 considered as signi cant. In a complementary analysis, we used Fixed and random model Circulating Probability Uni cation (FarmCPU), a model-selection algorithm that includes both xed and random effects components 42 . Preliminary analyses found individual runs of FarmCPU to be unstable, so the algorithm was rerun100 times with 90% subsampling to identify the most robust hits, scored by their "resample model inclusion probability" (RMIP) 43 ( see Methods for details). The RMIP values of 0.05 or greater were considered signi cant 44 . Genomic regions identi ed by both methods-meaning low empirical p-values and high RMIPs-are high-con dence hits; those identi ed by only one are still signi cant but of lower con dence. For those traits that did not pass through signi cant thresholds and also showed weak signals in the above two methods, GWAS was also performed with FarmCPU in GAPIT package from R software without any iterations. GWAS analysis identi ed a total of 200 MTAs for different aphid resistance-related traits, owering time, and plant height which are discussed below (Supplementary Table S2-a).
AC1-Aphid Count rst rating dates and AC2-Aphid Count second rating dates FarmCPU-RMIP model selection analysis identi ed 11 loci for AC1( Fig. 1, Supplementary Table S2-a). Three loci each were located on SBI-07 and SBI-09, while the remaining loci were distributed with a single SNP on SBI-01, SBI-02, SBI-06, SBI-08, and SBI-10. Among these markers, S8_11781182 showed a strong association (RMIP=0.41) with AC1. All SNPs that were signi cantly associated with AC1 explained 55% of total phenotypic variation (Supplementary Table S3). Two markers, S7_58778565 and S7_58839694 signi cantly associated with AC1, are located close (~479 kb) to the gene Sobic.007G151100 on SBI-07, which encodes a protein that is similar to 12-oxo-phytodienoic acid reductase, a gene expressed in response to sugarcane aphid attacks 45 . There were no signi cant associations identi ed for AC2 in either GLM or RMIP methods.

D2-Second aphid damage ratings
Use of the RMIP method detected 14 SNPs and GLM method detected 6 SNPs associated with aphid damage D2 ( Fig. 1; Supplementary Table   S2-a). These 20 MTAs were detected on all chromosomes except SBI-02 and SBI-04, with the strongest associations on SBI-01, SBI-06, SBI-07, SBI-08, and a cluster of SNPs on the tail end of SBI-09 with higher RMIP and p values. The portion of phenotypic variance explained by signi cantly associated SNPs was 63% (Supplementary Table S3). GLM analysis detected one MTA on SBI-06 (S6_3050581) approximately 298 kb away from the RMES1 locus. The S6_53836196 marker is located ~99 kb away from Sobic.006G184300 gene on SBI-06 that encodes a phloem protein 2 gene known to be induced due to sugarcane aphid infestation 46 . RMIP identi ed two SNPs, S9_56657642, and S9_56678671, which were < 1Mb away from WRKY transcription factor SbWRKY86, Sobic.009G238200 (S9_57628850 to 57630763), which was differentially expressed in a gene expression study 45 . The SNP S8_11781182 showed a strong association with damage D2 in both GLM and RMIP analyses.
Greenhouse evaluation of aphid damage-(GHD) There were no signi cant associations found for aphid damage in the greenhouse, except on SBI-07 using the FarmCPU RMIP method (Fig. 1).
The marker on SBI-07, S7_2494675 was identi ed with RMIP value= 0.09 (Supplementary Table 2 GWAS analysis for re ectance indices to assess SCA damage (with and without owering time as a covariate) Analysis of UAS re ectance data was performed both with and without owering time as a covariate in case plant maturity had a signi cant impact on the observed spectra.
A total of 6 SNPs were associated with NDRE when owering time was used as a covariate in the model selection analysis. Three of these SNPs were located on SBI-06, while a single SNP was found on SBI-01, SBI-02, and SBI-05. No signi cant SNP associated with NDVI was detected when the owering time was used as a covariate. A single SNP marker (S9_3054933) was signi cantly associated with SAVI when owering time was used as a covariate (Fig. 2 We also did analysis without using owering time as a covariate, and the RMIP analysis identi ed 20 SNPs signi cantly associated with NDRE. Of these, four SNPs were found on SBI-05, three on SBI-03, SBI-06, and SBI-08, two on SBI-01, SBI-02, and SBI-10, and a single SNP on SBI-07 (Supplementary Fig. S1 and Supplementary Table S2-a). There were four markers, namely, S1_5101469, S2_61431704, S6_58686833, and S6_60922873, that were signi cantly associated with NDRE and found in either case with or without using owering time as a covariate and the rest 16 SNPs were new loci when owering time was not used as covariate. Some of the important notable ones were S6_32034639 marker was found ~366 kb of the sugarcane aphid resistance QTL, qtlMs-6.2, located between 31,020,371 to 32,400,770 bp 27 . Among several loci identi ed, SNP S8_59192389 associated with NDRE is approximately10 kb away from the calmodulin, (CAM) gene, Sobic.008G159100 (SBI-08: 59203243 to 59204321), a differentially expressed gene family observed by Kiani & Szczepaniec 45 .
A total of 8 SNPs were associated with NDVI without using owering time as a covariate, with two SNPs on SBI-01, SBI-02, and SBI-05, while a single SNP each on SBI-03 and SBI-06 (Supplementary Fig. S1; Supplementary Table S2-a). The S6_3383155 showed a strong association (RMIP=0.23) with NDVI and was located 631 kb away from RMES1 locus.
A single SNP marker (S9_3054933) associated with SAVI on SBI-09 was also found in either case with or without using owering time as a covariate. NDRE associated markers explained 21-54% of the phenotypic variance, NDVI associated markers accounted for 41% of phenotypic variance and the single marker associated with SAVI accounted for 6% of the phenotypic variance (Supplementary Table S3). SNPs on SBI-05 at 3,230,535 bp and 3,282,948 bp were remarkably closer to each other (52.4 kb) and associated with NDRE and NDVI respectively. A common SNP on SBI-05 at 63,115,845 bp was associated with NDRE and NDVI.

Plant Height
FarmCPU-RMIP identi ed 12 SNPs with a RMIP ≥ 0.05, and these were located on all chromosomes except SBI-04, SBI-05 and SBI-10 ( Fig. 3, Supplementary Table S2-a). The most SNPs were found on chromosome SBI-09 with 4 SNPs that colocalized to the Dw1 locus (Sobic.009G229800) (19 kb to 131 kb away). There were two SNPs identi ed on SBI-06 that were close to the Dw2 locus, about 863 kb to 1.58 Mb away. There was one SNP, S7_59614079 that was located within the Dw3 locus and was 3 kb away from the start position for this gene. There were two SNPs on chromosome SBI-01 and single SNPs on chromosomes SBI-02, SBI-03 and SBI-08. The strongest association of SNP with plant height was identi ed on SBI-09 (S9_57172609; p = 1.41 x 10 -15 ; RMIP=0.48). The GLM results identi ed 78 markers on chromosomes SBI-06 and SBI-09 that colocalized to the Dw1 and Dw2 loci with markers as close as 9kb (S9_57051085) to 1.5 Mb from these loci. Markers on other chromosomes were not detected. These SNPs explained 74 % of phenotypic variance (Supplementary Table S3). All these loci were detected in previous studies compiled in Sorghum QTL Atlas by Mace et al. 47 except two locations. The two SNPs, S2_6406243 on SBI-02 and S8_45638928 on SBI-08 were found to be new loci associated with plant height in sorghum from this study. Maturity/Flowering time GLM analysis identi ed two SNPs that are located on SBI-03 in close proximity (S3_4423241, and S3_4423302) with high con dence (postpermutation p-value ≤ 0.01) (Fig. 3, Supplementary Table S2-a). FarmCPU model selection identi ed 17 SNPs across all chromosomes except SBI-05 and SBI-10, including a highly associated SNP at the end of SBI-03, a cluster of SNPs on SBI-06, and a strongly associated SNP at the end of SBI-09. These SNPs explained 63 % of the total phenotypic variance for maturity (Supplementary Table S3). One MTA on SBI-01 (S1_7584419) was ~1.5 Mb away from the known maturity gene Ma5 (PHYC) 48, 49 , while S6_40987299 was 0.7 Mb away from major maturity locus Ma1 (SbPRR37) 50 . The other loci identi ed in this study are also found in the vicinity of QTLs reported in Sorghum QTL Atlas and from other studies 47,51 , with the ones on SBI-03 having the most signi cant associations. Some of the loci were not detected in previous studies compiled in the QTL atlas of Mace et al. 47 and are reported here as novel loci identi ed in this study as they were very far from the known maturity loci (Supplementary Table S2-a). There were 6 new loci and these were at least 10 Mb away from known maturity loci or at least 100-200 kb away from the known existing QTLs. There were two loci on SBI-01 (S1_30327933 and S1_30462377 that were 30 Mb away from Ma3), two loci on SBI-06 (S6_50581550 and S6_50582469, 10 Mb away from Ma1), and one single locus on SBI-02 (S2_2488178, 65 Mb away from Ma2) and on SBI-07 (S7_2559101).

FarmCPU analysis without iteration (single-run)
For traits that did not yield any MTAs by GLM or RMIP analysis, we noticed that running FarmCPU with default parameters (a single run with no resampling) still generated a few hits near known candidate genes. These hits were apparently not robust enough to be picked up in the resampling analysis, and we include them here for the sake of completion and under the assumption that they probably indicate real signal that simply fell below our statistical threshold ( AC2 Aphid count for second-rating dates using FarmCPU single-run Single-run FarmCPU analysis identi ed a total of 7 SNPs signi cantly associated with the second rating dates of aphid count (AC2); three SNPs closely (<100 kb apart) located on SBI-04, one SNP on SBI-05 and three on SBI-06 were associated with AC2. Three SNPs, namely, S6_3381010, S6_3381051, and S6_3381094, were remarkably close to each other and ( Greenhouse evaluation of aphid damage (GHD) rating using FarmCPU single-run A total of 13 SNPs with a signi cant association to GHD ratings were identi ed using FarmCPU single-run analysis. Three SNPs were detected on SBI-01, two each from SBI-02 SBI-03, and SBI-09, and single SNP each detected on SBI-04, SBI-05, SBI-06, and SBI-07 (  Table S3). The RMIP and GLM analysis detected none or low phenotypic variation as only one signi cant SNP associated with GHD rating was identi ed (Supplementary Table S3). The S7_2494675 marker, signi cantly associated with GHD, was consistently found across RMIP and FarmCPU single-run data analyses. FarmCPU single-run analysis also detected a marker (S4_53265404 on SBI-04) associated with greenhouse aphid damage at 23 kb away from Sobic.004G180500 gene that encodes for 12-oxo-phytodienoic acid.
Genomic regions/clusters commonly identi ed for aphid-related traits Using FarmCPU and GLM analyses, overlapping genetic regions (within 500 kb) were consistently identi ed for aphid population number, aphid damage, and re ectance indices. These clusters found across different traits were sorted based on their chromosomal position (Supplementary Table S2-b). We found a few regions on SBI-02, SBI-03, SBI-05, SBI-08 and SBI-10 that were consistently found across different traits. The SNPs, S8_11781182 on SBI-08, and S10_ 2507813 on SBI-10 were consistently identi ed and associated with AC1 and plant damage D2 respectively (Supplementary Table S2-a & b). It is also noteworthy to observe that SNP S8_11408106 associated with aphid damage D1 was 373 kb away from this common genomic region, S8_11781182 on SBI-08. Additionally, three common SNPs, S2_61431704, S3_19558428, and S5_63115845 on SBI-02, SBI-03 and SBI-05 respectively were associated with the re ectance indices, NDRE and NDVI consistently. Another SNP associated with NDRE, S5_63115742 was remarkably close within 103 base pairs of consistently identi ed region S5_63115845 for NDRE and NDVI. This study identi ed two markers (S6_3050581 and S6_3383155) on SBI-06 that were associated with D2 and AC2 found in proximity (298-629 kb away) of RMES1 locus (Sb06g001650). The markers associated with D1 (S6_179752) and AC1 (S6_334458) were 154 kb apart from each other and 2.3 Mb away from RMES1 locus (Supplementary Table S2-b). The greenhouse GHD (S2_551409) and NDVI (S2_887719) associated SNPs on SBI-02 were closely located within 336 kb distance. The SNP marker associated with NDRE (S3_19929613) was within 371 kb from a marker, S3_19558428, consistently identi ed on SBI-03. There were three traits, GHD (S7_2494675), owering (S7_2559101) and AC1(S7_2888228) that were within 64-393 kb which may show that owering time and aphid resistance genes were found in a cluster on SBI-07. Aphid damage, D1(S8_11408106) was remarkably close (373 kb) to the common genomic region S8_11781182 that associated with damage D2 and AC1. The D1 SNP S8_11408106 is located 124 kb away from a gene, Sobic.008G075700, encoding LRR. SNPs associated with owering time (S8_39703004) and NDRE (S8_39703090) were within 86 base pairs in another location. SNPs associated with plant height (S9_56530072) and D2 (S9_56705459) were within 174 kb on SBI-09. Plant height (S9_57804067) and owering loci (S9_57914426) were identi ed within 109 kb on SBI-09.
Identi cation of differentially expressed known SCA genes within 200 kb of MTAs There are two studies that have done in-depth analysis of differential gene expression in sorghum upon sugarcane aphid infestation 45,46 . Our gene analysis was done exclusively to identify the candidate genes around MTAs identi ed in this study using genes expressed in these previously conducted SCA differential gene expression studies 45,46 . The genes from these two studies were searched if they were found within 200 kb distance upstream and downstream of the markers identi ed in our study. The genes that were detected two times due to their association with closely linked markers were removed and only unique genes were retained.
Of the 9,291 unique genes identi ed by Kiani and Szczepaniec,45 that were differentially expressed in resistant and susceptible lines after exposure to the sugarcane aphid, 561 (6%) unique genes were detected within 200 kb distance of the 74 markers associated with sugarcane aphid-related traits (Supplementary Table S4). Of these, eight genes encoding leucine-rich repeats (LRR) were found on SBI-05, SBI-06, and SBI-07. The SNP marker, S6_334458, associated with aphid damage (D1), is 5 kb away from Sobic.006G001900 encoding a LRR gene that was upregulated during aphid infestation in the previous study 45 . Four genes near NDRE associated SNPs on SBI-03, SBI-06 and SBI-08 were similar to lipoxygenase (LOXs), and two genes on SBI-05 and SBI-07 encoded avonol biosynthesis genes (Supplementary Table S4). The NDVI marker, S1-7442679, was found within ~99 kb of the Sobic.001G0955 gene that encodes a WRKY1 transcription factor.
Additionally, we pooled gene datasets obtained from Tetreault et al. 46 that were detected within a 200 kb distance of MTAs. We analyzed genes that were induced in both resistant and susceptible plants upon sugarcane aphid infestation using 16 gene-expression modules in this study. Out of 15074 genes evaluated, 911 (6%) unique genes were within 200 kb region of sugarcane aphid associated SNPs (Supplementary Table S5). The SNP markers associated with aphid damage D2, S6_3050581 is 147 kb away from Sobic.006G018900 encoding protein similar to avr9/Cf-9 rapidly elicited response protein. S2_61431704 associated with NDRE is 339 base pairs away from Sobic.002G222800 encoding protein similar to Putative Avr9 elicitor response protein. The analysis identi ed twenty-ve LRR, four LOXs, four CAM dependent protein kinases, two WRKY transcription factors, and one avonol 3-O-glucosyltransferase.
In total, twenty-nine LRR, four CAM proteins, four lipoxygenase (LOXs), two WRKY, and two avanol biosynthesis genes were identi ed from the above two studies (all supplementary les of Kiani and Szczepaniec, 45 and 16 selected modules of Tetreault et al. 46 (Supplementary Table   S4 and S5). When all the modules from the Tetreault et al. 46 study were used for candidate gene analysis, one 12-oxophytodienoic acid reductase gene (Sobic.004G180500) was found very close within 23 kb from S4_53265404. The Sobic.007G151100 encoding 12oxophytodienoic acid reductase was also found within 483 kb to a marker, S7_58778565, associated with aphid count (AC1). We also found three LRR genes on SBI-05 and SBI-07, and one WRKY1 gene on SBI-01, which were commonly found in the studies by Kiani and Szczepaniec, 45 and Tetreault et al. 46 .
We also analyzed the number of genes and gene frequency on individual sorghum chromosomes found near our SNPs associated with different aphid traits, by using these two datasets 45,46 . The highest gene frequency distribution was observed on SBI-01 followed by SBI-06 ( Supplementary Fig. S3-a & c). The majority of these genes were captured by markers associated with NDRE and D2 aphid traits ( Supplementary Fig. S3-b & d).
The markers in the current study that showed more than 10 genes were considered as genic-rich regions as they may reveal a cluster of genes around them. Our result indicated that a total of 59 markers were associated with more than 10 genes found in the above two gene expression data 45,46 . These clusters ranged from 11-32 candidate genes for each of these markers within 200 kb that might have promising genic-rich regions/clusters related to sugarcane aphid resistance (Supplementary Table S6).

Candidate gene analysis near consistently identi ed markers
The candidate gene analysis was focused near common and consistent markers identi ed between different traits on SBI-02, SBI-03, SBI-05, SBI-08 and SBI-10 using previously known SCA genes 45,46 (Supplementary Table S4 and S5). A total of 25, 3, 1, 1, and 32 genes were found near these consistently identi ed markers, S2_61431704, S3_19558428, S5_63115742, S8_11781182 and S10_2507813, respectively. Some of the noted ones were Sobic.005G158000; oxidative stress response genes-similar to Cytochrome P450 71E1 found on SBI-05 and Sobic.008G077600-similar to Diacylglycerol kinase 1, found on SBI-08 as they were narrowed down to single genes from the previous gene expression studies around these markers. S2_61431704 marker was close to Avr9 elicitor response protein found in both studies. Among 32 genes found near S10_2507813, several LRR repeats were detected and one of them, Sobic.010G030000 was within 77 kb.

Discussion
Genetic and molecular basis of aphid resistance among agriculturally important crops has often been associated with the involvement of large-effect R genes, highlighting gene-for-gene Flor's hypothesis [52][53][54][55] . But often we also see the small effects of several other genes/QTLs that contribute to aphid resistance in sorghum 56 . Sugarcane aphid resistance is not completely understood owing to its recent emergence of pest status on sorghum in North America. Some studies have attempted to understand the molecular basis of this interaction in sorghum as monogenic, though it is believed to be oligogenic and polygenic [23][24][25]56 . Most of the studies highlighted the importance of RMES1 locus located on SBI-06 and the causative variants around this region to be the major resistance gene for sugarcane aphid resistance in sorghum depending on the source of resistance used in these studies 24,25,27,56 . SCA resistance is monogenic (single locus) for RME6 ACK60 x RTx2783. However, in other sources it could be polygenic.
This study utilized 283 accessions from SAP population grown in the eld and greenhouse to identify SNPs associated with sugarcane aphid resistance in sorghum. It combined phenotypic data from two years of eld-based visual ratings with one year of re ectance data, and greenhouse evaluations to identify genome-wide associated (GWAS) marker-trait associations (MTA) using genotyping-by-sequencing (GBS) data. GWAS analysis identi ed a total of 200 markers for 8 different aphid resistance-related traits, owering time, and plant height (Supplementary Table S2-b). MTAs were identi ed through GWAS studies that used three different methods such as RMIP in FarmCPU, GLM and single-run FarmCPU. These methods were robust in identifying high quality and high con dence GWAS-QTLs with stable genomic positions for MTAs as owering time was used as a covariate in much of the analysis. Using owering time as a covariate reduced any signi cant spurious associations with owering time genes and separated them from most of the other MTAs. The signi cant associations were further explored to identify candidate genes that are known to in uence sugarcane aphid resistance in sorghum. We noticed that the SNPs associated with the re ectance indices such as NDVI were not detected when owering time was used as covariate. We also found that owering time SNPs did not collocate with these re ectance indices indicating owering time genes were different from NDVI SNPs. When owering time was not used as covariate, NDVI identi ed signi cant MTA's that had aphid related biological signi cance. At this time, it is not clear if owering morphology would impact these traits.
The GWAS analysis identi ed 96 markers related to sugarcane aphid resistance traits. After removing commonly identi ed duplicate markers associated with aphid-related traits, about 85 markers were at unique locations. Among these markers, UAS re ectance indices (NDRE, NDVI and SAVI) captured 28 markers, which accounted for 33% of the aphid related traits. The remaining 57 markers (67%) were captured by visually recorded SCA traits (D1, D2, AC1, AC2, and GHD). In the current study, the high throughput phenotyping has enhanced our ability to tag aphid resistance and plant health traits and also supported visual scoring with a few common markers. The phenotypic variance explained by these markers varied from 0-63% for all aphid resistance-related traits (Supplementary Table S3). Among the 96 MTAs (manual scoring, greenhouse, and drone) markers, S1_14448039 on SBI-01 explained the highest phenotypic variance for aphid damage D2. The S8_11781182 marker found on SBI-08 had higher RMIP and GLM permutation values for D2 traits. These markers could be potential indicators of metabolic pathways or hormonal signaling defense pathways for aphid resistance. A signi cant correlation was observed between UAS re ectance indices and visually recorded aphid traits rating indicating that both methods quantify the damage that occurred on sorghum plants due to SCA infestation. The negative correlation of NDVI, SAVI, and NDRE with aphid damage ratings indicated that plant stress increased with increasing plant injury. Higher values for NDVI, SAVI, or NDRE indicated lower stress levels, while lower NDVI, SAVI, or NDRE values were associated with higher plant stress due to aphid damage. This study con rms with previous studies 57,58 . Further study may be needed to identify the best index to analyze the impact of the sugarcane aphid pests on crop yield and biomass.
Among aphid resistance traits, the maximum number of markers,16, was identi ed on SBI-06. Recent studies have speci cally dissected regions around the RMES1 locus and found several NBS-LRR containing genes that were associated with resistance gene products 25,27,56 . In our investigation, we were able to locate LRR motifs and Avr9 elicitor response protein within 200 kb region for markers associated with aphid damage and aphid count on SBI-06. Recently, Muleta et al. 56 also found regions on SBI-06 and other chromosomes for sugarcane aphid resistance and our study also found these regions to be involved with aphid damage, aphid count, greenhouse aphid damage evaluation and UAS-based re ectance vegetation indices. When we compared our markers with these FSt outlier markers for sugarcane aphid resistance, 34 markers were found to be in close proximity (within 500 kb) to 40 markers identi ed in Muleta et al. 56 (Supplementary Table S7). These markers were as close as 0 to 356 kb between the two studies and most of these markers came from SBI-06-and SBI-09.
There are only a few studies that have elucidated the changes in gene expression upon sugarcane aphid infestation in sorghum. In our study, we were able to detect 6% of known unique genes from these two datasets related to sugarcane aphid response using our MTAs identi ed.
Our candidate gene search was limited to these two datasets from the existing two experiments, and our window to examine these genes was only 200 kb within signi cant markers 45,46 . Therefore, we also looked at some potential candidate genes that were beyond 200 kb and some genes that may not be reported in these two experiments.
Kiani and Szczepaniec, 45 reported that 2-week-old and 6-week-old resistant and susceptible plants orchestrated several defense pathways to sugarcane aphids by induction of hormone-signaling pathways coding for secondary metabolites, glutathione metabolism, and plantpathogen interaction. Also, genes associated with the metabolic process, biosynthetic process, macromolecule process, and oxidationreduction were upregulated in the younger resistant plants as enriched GO terms. Genes related to the Ca 2+ sensor group, which encodes calmodulin-like (CMLs) and calmodulin (CAM) proteins showed up-regulation in response to aphid injury regardless of sorghum genotype or developmental stage 45 . Our UAS-based result indicated that the marker identi ed on SBI-08, SNP S8_59192389 associated with NDRE is approximately 10 kb away from a gene that encodes a CAM family protein, Sobic.008G159100 (SBI-08: 59203243 to 59204321), a differentially expressed gene family observed by Kiani and Szczepaniec,45 . In this study markers such as S1_27239199, S4_53265404, and S5_3230535, associated with greenhouse, NDRE, NDVI, and aphid count are located within 200 kb away from the genes encoding calmodulinbinding family proteins (Supplementary Table S4). These genes are expressed upon plant infestation by aphids 45, 46, 59 .
Tetreault et al. 46 found in general that aphid infestation induced global gene expression changes in the susceptible line that re ected elevated levels of herbivory-related stress and cessation of plant growth through deployment of plant defense genes such as nucleotide-binding-site leucine-rich-repeats (NBS-LRR), peroxidases, glutathione S-transferases, laccases and several phloem proteins, wound responsive genes and WRKY transcription factors. We found S6_53836196 that was very close to phloem proteins found in the above study 46 . We found at least 29 LRR regions that were all associated with aphid resistance traits (Supplementary Table S4 and S5). Most of the LRRs found in this study were on SBI-05, SBI-06, and SBI-07. There was one LRR family containing protein on SBI-08, Sobic.008G075700 which was ~500 kb away from S8_11781182 that was consistently found between GLM and RMIP with higher thresholds. This region from S8_11408106 to S8_11781182 (3 73 kb) was consistently found for aphid damage D1, D2 and aphid count AC1 traits and the same region was also in close proximity (127 to 245 kb) to markers found by Muleta et al. 56 . The S6_53836196 marker associated with D2 was located ~99 kb away from Sobic.006G184300 gene on SBI-06 that encodes a phloem protein 2 gene and this gene showed an increased gene expression due to sugarcane aphid infestation 46 .
Zhu-Salzman et al. 60 reported that treatments of sorghum (Sorghum bicolor) seedlings with methyl jasmonate resulted in a reduced number of greenbugs aphids (Schizaphis graminum) compared with the control plants, suggesting the signi cance of jasmonic acid (JA)-pathwaymediated defense in sorghum against aphids. Grover et al. 61 found the involvement of phytohormone 12-oxo-phytodienoic acid (enzyme,12oxophytodienoate reductase), a precursor involved in jasmonic acid (JA) biosynthesis pathway as the main volatile component involved in aphid resistance in the sorghum SCA-tolerant SC35 plants. In the present investigation, we found two locations, S4_53265404 (GHD) was ~29 kb away, and S7_58778565 (D1 and AC1) was 479 kb away from this gene, 12-oxo-phytodienoic acid reductase, Sobic.007G151100. This enzyme, 12-oxo-phytodienoate reductase, is involved in recognizing aphid attacks and induces JA hormonal response. The early recorded traits such as AC1 and D1 (immediately after aphids were observed) and GHD rating response reveals early onset response and the markers, S4_53265404 and S7_58778565 associated with these traits (GHD, D1 and AC1) tagged to 12-oxo-phytodienoic acid suggest that these genes are the rst line of defense when aphids attack.
Lipoxygenase (LOXs) genes are known to play a signi cant role in aphid resistance in sorghum and other crops 62-64 . Shrestha et al. 65 performed a genome-wide analysis of the sorghum lipoxygenases (LOXs) genes and identi ed the expression of nine LOXs genes. In this study, we found four lipoxygenase genes (Sobic.003G385500, Sobic.003G385900, Sobic.006G248300, and Sobic.008G157900) within 200 kb of S3_69633709, S6_58686833, and S8_59192389 markers on SBI-03, SBI-06, and SBI-08. Among nine 9-LOXs found by Shrestha et al. 65 , SbLOX3 (Sobic.003G385500) and SbLOX4 (Sobic.003G385900) were found to be duplicated genes and induced in early response to sugarcane aphids. The marker, S3_69633709, associated with NDRE, was found within 163 kb and 193 kb of SbLOX3 and SbLOX4, respectively. SBLOX9 was found to play an important role in resistant plants in early response to sugarcane aphid 65 . There were a few markers close to SbLOX9 on SBI-06 but they did not pass signi cant thresholds set in GLM and/or RMIP methods. The LOX genes identi ed in our study were also reported by Kiani and Szczepaniec,45 and Tetreault et al. 46 (Supplementary Table S4 and S5).
The use of FarmCPU and GLM were robust and were able to identify previously identi ed dwar ng and maturity genes.  [73][74][75][76][77] . Of these six loci, Ma1 is known to be the major locus controlling maturity and has greatest sensitivity to LD (Long Day) conditions and is closely linked to Ma6 and Dw2 on SBI-06 50 . In our study, we found several owering time QTLs/genes across seven chromosomes. These were located on SBI-01, SBI-02, SBI-03, SBI-06, SBI-07, SBI-08, SBI-09. Of these, a marker linked to Ma1 on SBI-06 was detected in this study within close proximity (Supplementary Table S2-a) 50 . Mace et al. 78 in a study of the Sorghum BC-NAM population, revealed some weak QTL on SBI-03, none on SBI-06, and a strong one on SBI-09 in the same place as the strong RMIP-identi ed MTAs in this study. The majority of these loci agree with the existing literature on owering time genes in sorghum 47 . The identi cation of height and maturity loci to known regions from existing literature supported that our methods used in this study were robust and su cient to resolve GWAS QTLs. This study has also identi ed six new genomic locations associated with owering time genes that did not show in other studies to the best of our knowledge.
In this study, SNP markers on SBI-08 at 11,781,182 bp and SBI-10 at 2,507,813 bp were consistently identi ed as being associated with aphid count and aphid damage and SNP markers S2_61431704, S3_19558428, and S5_63115845 for re ectance indices-based traits were identi ed consistently on SBI-02, SBI-03, and SBI-05 respectively. The candidate gene search around these markers showed more on LRR genes, Avr9 elicitor response protein, kinases and oxidative stress response genes. These pleiotropic regions may harbor potential candidate genes for SCA resistance.
SAP represented in this study has shown 5 highly resistant lines and 12 highly susceptible lines and these accessions may represent potential sources for SCA resistance among 283 lines used. Most of the studies conducted to-date for SCA resistance have limited number of parental sources of resistance and reveal more on RMES1 locus. This study is the rst to assess SCA phenotyping in the eld for two years using a large panel. This study reveals that SCA resistance may be controlled by different resistance genes based on the resistant source.

Conclusion
This study combined phenotypic evaluations from eight different traits conducted in the eld and greenhouse to assess sugarcane aphid resistance in sorghum. In this research, UAS-based re ectance data was shown to be a valuable HTP tool for the rapid assessment of sugarcane aphid damage. The use of GWAS using the SAP population rated for sugarcane aphid resistance traits identi ed the RMES1 locus and its surrounding cluster of genes but also new regions on SBI-02, SBI-03, SBI-05, SBI-08 and SBI-10 that were consistently found across different traits. The methods employed in GWAS were robust in stabilizing the genomic positions of MTA. Furthermore, leveraging existing gene expression data on sugarcane aphid resistance allowed us to identify candidate genes that are closely located within the signi cant markers identi ed in this study. Moreover, some of these genes were known to be involved in aphid resistance. This work has provided several markers that could be useful in marker-assisted selection for developing sugarcane aphid resistant varieties. Functional validation and positional cloning are needed to prove their involvement in sugarcane aphid resistance. Also, in the future, understanding of positive and negative epistatic interaction among major QTLs is needed to unravel the genetic architecture of sorghum for sugarcane aphid resistance breeding. This study has identi ed ve resistant (1-3 score) lines from the SAP that could be valuable for future breeding. Resistant sources that contain these new sugarcane aphid resistance regions could be utilized for developing sugarcane aphid resistant varieties of sorghum.

Plant Materials
All methods were performed in accordance with the relevant guidelines/regulations/legislation. Plant materials consisted of 276 lines from the SAP 39 and seven additional lines that have known insect resistance/susceptibility traits. The full list of plant accessions used in this experiment and their source is in Supplementary Table S8. were established at the beginning of the study and marked with a temporary landmark, using a Trimble Geo7X global navigation satellite system receiver with real-time kinematic correction (i.e., RTK), providing +/-2 cm horizontal accuracy at the coordinate location. Prior to each ight, the GCPs were targeted, and calibration images were collected using Micasense calibration protocols. Pix4DMapper software (v. 4.6.4; Pix4D SA, Switzerland) was used to produce orthorecti ed mosaics, photogrammetric point clouds and surface elevation models, georeferenced to within a satisfactory root mean square error (RMSE). The nal spatial resolution of the mosaicked images was 2.64 cm. The

SNP Data
Genotype data from the SAP was obtained from Morris et al. 72  where σ 2 g is the genotypic variance, σ 2 gy is the genotype by year variance, and ε is the residual error variance. Heritability values for UASbased re ectance indices for speci c date, Aug 25, 2020 is reported as it coincided with visual scoring dates and had a higher correlation than the other dates (Supplementary Table S1-e).

Genome-wide Association Analysis (GWAS)
From 281 Sorghum Association Panel (SAP) accessions, 458,313 SNPs were obtained of which 191,738 SNPs with a MAF ≥ 0.05 were selected and used for GWAS analysis. There were two accessions that did not have GBS data from our collection of 283 lines. Analyzed traits consisted of visual eld assessment of sugarcane aphid damage scores (D1 and D2), estimated aphid count (AC1 and AC2), greenhouse visually scored aphid damage (GHD), UAS-based re ectance data recorded on Aug 25, 2020, for Normalized Difference Vegetation Index (NDVI), Normalized Difference Red Edge (NDRE) and Soil Adjusted Vegetation Index (SAVI), plant height (PH), and owering time (FL) performed using Fixed and random model Circulating Probability Uni cation (FarmCPU). For traits that were not normally distributed such as aphid count and damage scores, plant height, and owering time, log transformation was applied.
To control the population structure, the rst three principal components (PCs) were used as covariates for all traits. In addition, owering time was used as a covariate for aphid damage, aphid counts and UAS-based vegetative indices to remove associations due to plant maturity, and nSPAD readings were used as covariates for greenhouse visually scored aphid damage.
GWAS with TASSEL-GLM 1000 GWAS was performed in TASSEL v5.2.73 41 using a generalized linear model (GLM) with 1000 permutations to determine statistical signi cance. (The ability to rapidly perform permutations was the reason for choosing GLM over a mixed linear model with a kinship matrix. See the section on Model Selection, below, for a complementary GWAS analysis that includes a kinship matrix.) Population structure was controlled by including the rst three genetic principal coordinates as covariates, calculated using the distance matrix and MDS functions in TASSEL.

GWAS with FarmCPU-Model selection RMIP-method
To complement the GLM analysis above, model-selection GWAS was performed with FarmCPU 42 as part of the GAPIT3 91,92 software package for R. The same genotype le was used for all traits, and the model included three genetic principal coordinates and a kinship matrix calculated using the VanRaden method 93 with the function GAPIT.kinship.VanRaden().
Early tests found that the FarmCPU results, like many model selection algorithms 94 , were unstable when six samples were accidentally excluded from the analysis and generated very different lists of signi cant loci. To overcome this instability, we performed model selection in two steps. First, a null distribution of results was created by randomly permuting the genotypes relative to the phenotypes 100 times. Kinship was not permuted to maintain the covariance with phenotypes, and a random 10% of samples were excluded each permutation to match expected power with the following resampling step. The results from this step formed a null distribution of p-values and allowed us to determine which results are more signi cant than due to chance.
The second step consisted of running the actual (non-permuted) genotypes 100 times, each time randomly excluding 10% of samples.
Resampling allowed us to generate Resample Model Inclusion Probabilities (RMIPs) 43 , a measure of how stable each genomic association is.
SNPs with RMIP scores above 0.05 (meaning 5 out of 100 models) were considered signi cant 44 .

Phenotypic Variance Explained
To estimate the variance explained by each marker (or set of markers), we used GAPIT3 91,92 to calculate a kinship matrix using the VanRaden method 93 and the prcomp() function in R to derive genetic principal components (PCs). For each phenotype, we t a model including the rst three PCs, the kinship matrix, and any covariate phenotypes (e.g., owering time), as with the GWAS analyses. The residuals from that model were then t in a second linear regression with the SNP(s) of interest to determine the fraction of residual variance explained in the form of the model R 2 . Code for this analysis is available on GitHub at https://github.com/wallacelab/paper-punnuri-aphids-2021.

GWAS with single-run FarmCPU
For those traits that did not pass-through signi cant thresholds and also showed weak signals in the above two methods, GWAS was also performed with FarmCPU in GAPIT package from R software without any iterations. The results from this analysis are displayed in the supplementary les and were only performed in the case of aphid count second-dates rating (AC2) and greenhouse evaluation data as they showed genes relevant to sugarcane aphid herbivory on sorghum. The results need to be further validated as no iterations were conducted in this method.

Candidate gene search from previous gene expression studies
Candidate genes were also selected based on data available on differentially expressed genes (DEGs) between resistant and susceptible sorghum from two previous experiments 45,46 . There were sixteen modules from Tetreault et al. 46 that were used in identifying potential candidate genes for sugarcane aphid infestation. These module numbers were 1, 2 ,4, 5, 6, 7, 9, 10, 12, 14, 15, 16, 17, 18, 19, and 23 which were known to be involved in sugarcane aphid infestation response. DEGs within 200 kb of a MTA (either GLM or FarmCPU resampling) were considered potential candidate genes. Gene distances and annotations were based on the Sorghum bicolor genome v3.1.1 available on Phytozome (https://phytozome.jgi.doe.gov/).  show where several weak loci indicate a signi cant but poorly resolved locus (such as on chromosomes SBI-04, SBI-05, and SBI-06 for NDVI).