Cost-effective library preparation for whole genome sequencing with feather DNA

Samples from avian species of high conservation concern are often low in total genomic DNA (gDNA). Feathers are collected more frequently than blood, yet they contain low total gDNA which can present challenges in using them for genomics. Whole Genome Sequencing (WGS) can provide many insights that can be used to aid in species conservation, but current methods for working with feathers are cost prohibitive for population level genomic analyses. Thus, there is a need for a cost-effective method of preparing WGS libraries from feathers. To bridge the gap between sampling techniques commonly used in avian conservation genetics that yield low total gDNA and the powerful tool of WGS, we developed a method that successfully produces WGS libraries from feather gDNA with as low input as 0.48 ng of DNA and an average final library size of 300–500 base pairs. Sequencing results suggest high sequencing quality using our protocol with feather DNA. We conclude that our method will facilitate high-throughput WGS on low total gDNA samples, like feathers, thus expanding the power of genomic tools in critical avian conservation research.


Introduction
The field of genomics began developing in the 1990s with whole genome sequencing of bacteria (Fleischmann et al. 1995) and has advanced rapidly with the improvement and reduction in cost of High-Throughput Sequencing (HTS) techniques (Schloss 2008). The ability to sequence whole genomes with relative ease has opened new research avenues and made it possible to estimate fundamental population genetic parameters with increasing precision in both model and non-model organisms (Allendorf et al. 2010;Ouborg et al. 2010). The use of HTS is highly relevant to ecologists and conservation biologists. This technique has allowed the exploration of previously challenging topics such as the genetic basis of local adaptation, patterns of inbreeding across the genome, and how species adapt to changing climate conditions (Kohn et al. 2006;Allendorf et al. 2010;Ouborg et al. 2010;Ruegg et al. 2018). As a result, genomic tools are revolutionizing the fields of ecology, evolution, and conservation biology.
Despite the proliferation of HTS methods for model organisms (Ekblom and Galindo 2011), there remain a number of technical and financial limitations to the widespread use of genomic approaches in situations where the amount of input DNA is limited. While the high cost of sequencing has dramatically decreased in the last two decades (Goodwin et al. 2016), it is often still prohibitively high for use in population-level studies where hundreds or thousands of individuals must be sequenced (Fuentes-Pardo and Ruzzante 2017). Methods modified from commercially available WGS library preparation kits offer low coverage options at a fraction of the cost per individual, making them suitable for population genetics studies (Kryazhimskiy et al. 2014;Baym et al. 2015;Therkildsen and Palumbi 2017). However, these methods can be unsuccessful at amplifying certain sample types, like DNA extracted from feathers which is typically low in total genomic DNA (gDNA). Given the immense potential benefits of analysis of whole genome data for effective avian conservation and management efforts (Ryder 2005;Funk et al. 2012;Russello et al. 2015), there is a need for a cost-effective method for preparing WGS libraries from feathers.

3
Low total gDNA samples, for example feathers, are a hallmark of non-invasive or minimally invasive sampling techniques in birds. Minimally invasive sampling entails capturing or handling a study organism with minimal invasion or tissue collection (e.g. feather pulls and buccal swabs) (Carroll et al. 2018). Non-invasive sampling can also include saliva, hair, feces, or feathers. Non-invasive genetic sampling methods first gained recognition in 1992 when DNA was successfully extracted from passively-collected hair for a genetic study of an endangered bear species (Taberlet and Bouvet 1992). Since then, non-invasive genetic sampling has been successfully used in genetic studies across myriad taxonomic groups (Valière et al. 2003;Regnaut et al. 2006;Stenglein et al. 2010;Roques et al. 2014). In recent years, non-invasive and minimally invasive sampling methods have gained popularity, especially for use in monitoring threatened and endangered species (Lukacs and Burnham 2005;Fuentes-Pardo and Ruzzante 2017).
Although low cost methods for WGS exist (Kryazhimskiy et al. 2014;Baym et al. 2015;Therkildsen and Palumbi 2017), we did not find them to be successful at amplifying DNA from feathers. More specifically, further analysis of the Therkildsen and Palumbi (2017) method revealed that much of our feather DNA was wasted during the library preparation. When prepared with this method, our feather samples had average fragment sizes of 1 kb after tagmentation and indexing. Yet, many sequencing platforms, such as Illumina, require 300-500 bp amplicons, although this varies between vendors and platforms. Thus, DNA above 500 base pairs is often removed prior to sequencing resulting in considerable loss of DNA. To bridge the gap between sampling techniques commonly used in avian conservation genetics that yield little gDNA and the powerful tool of WGS, we modified the Therkildsen and Palumbi (2017) protocol to develop a method that successfully produces libraries from feather gDNA with as low input as 0.48 ng of DNA, with an average final library size of 300-500 base pairs.
We identify three parameters to adjust to maximize efficiency for feather WGS and demonstrate the utility of our method for producing high quality sequencing data at a fraction of the cost of traditional commercial library preparation methods using DNA extracted from a single flight feather calamus, or quill, of a small (8-9 g) passerine bird, the American Redstart (Setophaga ruticilla). The American Redstart is a long-distance migratory species with a breeding distribution across Canada and the United States, and a nonbreeding distribution across the Caribbean, Central America, and northern South America. While the American Redstart is not listed as a species of concern, the species has declined by approximately 1.1% per year over the past half century (Sauer et al. 2020). Previous genetic studies on this species have used blood samples to identify phylogeographic structure (Colbeck et al. 2008) and breeding origins of migrating individuals (DeSaix et al. 2022). Using feathers for genetic studies in such a species would greatly enhance researchers' abilities in collecting a large sample size across a broad distribution. In this study, we use this species to demonstrate the efficacy of our library preparation protocol for smaller bird species. We assess the WGS data from our DNA libraries generated from feathers and demonstrate that our method produces high quality sequencing data. These results have important implications for avian conservation genomics research seeking to maximize efficient sequencing from low total gDNA samples like feathers.

Library preparation for sequencing
We identified three key parameters in the Therkildsen and Palumbi (2017) method that we could optimize to target the ideal fragment distribution and minimise critical DNA loss when working with feather DNA samples. The three key parameters modified herein were: (1) the ratio of tagmentation transposome (which cleaves DNA and adds an adapter for indices) to input DNA quantity, (2) the duration of tagmentation incubation, and (3) the duration of the indexing PCR elongation time. We identified these parameters as key because: (1) the ratio of tagmentation transposome to DNA molecules determines the average fragment length, (2) the duration of tagmentation incubation determines the efficacy of fragmentation by allowing enough time for tagmentation to occur, (3) the indexing PCR elongation time dictates the average fragment length of amplified DNA. We conducted initial tests to determine the most efficient modifications to the protocol (not presented in this manuscript) and finalized the protocol using those adjusted parameters.
To assess the efficiency of our method, we extracted DNA from 50 flight feather calamus tips, also called the quill or the hollow tip of the feather that is attached to the bird, of the American Redstart. Feathers were stored in envelopes at − 20 °C until extraction and were collected between 2000 and 2019. Prior to extraction, single feathers were removed from their envelopes using gloves. The quill, also called the calamus, was cut off using a sterile razor blade. The rest of the feather was retained for potential future analyses, such as isotope analysis. DNA was extracted immediately prior to library preparation. For extractions from feathers, like other low-yield samples, maximizing gDNA yield is critical. Therefore, we followed the commercially available DNEasy Blood and Tissue Extraction Kit (Qiagen) protocol but with the following modifications. To each sample, we added 10 μL of 1 M Dithiothreitol (DTT) to the initial lysis step to aid in breaking down disulfide bonds found in the keratin of feathers. Flowthrough after the first filtration step when lysate was transferred to the spin column was pipetted back onto the filter for a second centrifugation. Prior to the final elution step, AE buffer was placed in an incubator at 56 °C to increase DNA yield (Winters et al. 2018;Peters et al. 2020;Adams et al. 2022). During the final elution step, AE buffer was left to incubate on the filter for five minutes instead of two (Tsai et al. 2020). We eluted feather extractions into 400 μL (two rounds of 200 μL elutions through the spin column as recommended by Qiagen protocol for maximum yield). Prior to proceeding with library prep, we concentrated feather DNA extractions using a 1:1 ratio of Serapure beads (Faircloth and Glenn 2014) from 400 to 15 μL and eluted into 10 mM Tris-Hcl (Fig. 1, step 1a).
To ensure that final library concentrations will be similar across samples, we quantified each DNA extraction using a Qubit dsDNA High Sensitivity Assay Kit (Invitrogen) and normalized each sample to a concentration of 0.48-4.5 ng/ μL, with a target of 2.5 ng/μL (Fig. 1, step 1b). To fragment the DNA and tag it with Nextera adapters, we added 2.50 μL of TD Buffer and 0.5 μL of TDE1 Enzyme (Illumina) to 1 μL of normalized DNA (e.g. 0.48 ng to 4.5 ng of DNA) and incubated the samples in a thermocycler at 55 °C for 20 min (Fig. 1, step 2).
To amplify the tagmented DNA and add Nextera indexing adapters for sequencing, we pipetted 1 μL of each index primer into the appropriate well of tagmented DNA. We then added 6.0 μL of Kapa Hifi Hotstart Mix (KMM; Kapa Biosystems) before running in a thermocycler as follows: held at 72° for 3 min, held at 98° for 2 min and 45 s, cycled 8 times through 98° for 15 s, then 62° for 30 s, then 72° for 30 s, held at 72° for 1 min, and then held at 4° until removed from thermocycler ( Fig. 1, step 3). As per the Therkildsen and Palmubi (2017) method, this indexing PCR had more cycles than the original Illumina protocol and was broken into two stages (Indexing PCR and Reconditioning PCR). Additional cycles were added in the Therkildsen and Palumbi (2017) method because the tagmented DNA was not purified prior to the Indexing PCR, making the PCR reaction less efficient. Tagmented DNA was not purified prior to PCR amplification to decrease protocol duration while minimizing potential loss of DNA. To further amplify copies of indexed DNA without using additional Nextera indices, we added 7.6 μL of KMM, 4.4 μL of ultrapure water, and 1.6 μL each of a custom 10uM primer pair (P1 = AAT GAT ACG GCG ACC ACC GA; P2 = CAA GCA GAA GAC GGC ATA CGA) to each library. We ran the samples in a thermocycler as follows: held 95° for 5 min, cycle 4 times through 98° for 20 s, then 62° for 20 s, then 72° for 2 min, hold at 72° for 2 min, and then held at 4° until removed from the thermocycler (Fig. 1,  step 4).
To purify the PCR product and remove undesirable fragments, we followed standard Ampure bead protocol (Beckman Coulter) using a 0.7:1 bead to DNA ratio which will remove amplicons below 320 bp in length, and eluted into 30 μL of 10 mM Tris-Hcl (Fig. 1, step 5). In order to avoid overrepresentation of one individual during whole genome resequencing, we then quantified using a Qubit dsDNA High Sensitivity Assay Kit (Invitrogen) and pooled an equal number of copies of each sample into a 1.5 mL tube (Fig. 1,  step 6). Finally, to increase the final concentration of the pooled libraries and increase sequencing efficiency, we then followed the standard Ampure double-sided size selection protocol, using a 0.63:1 bead to DNA ratio to remove large fragments and a 0.73:1 bead to DNA ratio to remove small fragments, and eluted into 30 μL of 10 mM Tris-HCl (Fig. 1, step 7). We perform a second clean-up in order to increase the final library concentration by decreasing the final volume. This clean-up has the added benefit of reducing the variation between individually size-selected samples by reducing the width of the final fragment distribution peak. After the pooled library has been concentrated and double size selected, we perform final quality control (QC) with Fig. 1 Lab workflow diagram for our method modified from Therkildsen and Palumbi (2017). The blue and red tubes represent two unique DNA samples being prepared and pooled for WGS Qubit quantification and Tapestation 2200 fragment distribution analysis (Agilent) (Fig. 1, step 8).
Using the above method (for full written protocol, see Online Resource 1), we prepared one WGS library from American Redstart (Setophaga ruticilla) samples for low coverage WGS (< 2x). The library was prepared with 50 unique feather samples, extracted from a single flight feather calamus, with starting DNA concentrations of 0.48-4.5 ng/ μL. The library was sequenced on one full 2 × 150 bp PE (paired end) HiSeq 4000 lane (Illumina).

Bioinformatic analysis and quality checking
We trimmed sequence adapters using the program TrimGalore version 0.6.5 (https:// github. com/ Felix Krueg er/ TrimG alore), a wrapper for Cutadapt (Martin 2011). We used the Burrows-Wheeler Aligner software version 0.7.17 (Li and Durbin 2010) to map reads to an assembled genome of the yellow warbler (Setophaga petechia; Bay et al. 2018). After mapping, the resulting sequence alignment map (SAM) files were sorted, converted to binary alignment map (BAM) files, and then indexed using Samtools version 1.9 (Li et al. 2009). We marked and removed read duplicates with MarkDuplica-tesSpark from GATK version 4.1.4.0 (McKenna et al. 2010). We used Picard software (http:// broad insti tute. github. io/ picard/) to check sequence quality of the BAM files; specifically we used CollectWgsMetrics to measure coverage depth and the proportion of reads to pass quality filters. To check for an effect of sample DNA quantity on sequencing coverage depth, we implemented linear regression models using the lm() function in R version 4.1.2 (R Core Team 2021). All bioinformatic analysis figures were produced using the Tidyverse package (Wickham et al. 2019) in R.
To demonstrate the utility of our library preparation protocol for conservation genomic applications, we used ANGSD version 0.939 (Korneliussen et al. 2014) to call single-nucleotide polymorphisms (SNPs). ANGSD is particularly suited for low-coverage WGS data as it uses genotype likelihoods in a probabilistic framework instead of calling genotypes. We identified SNPs at sites with a probability of < 1e−6 of being monomorphic and only included sites that had read data in at least 50% (25) of the individuals. Sites were excluded that had a total read depth of > 32x (average depth * minimum number of individuals) and > 130x (2 * average read depth * maximum number of individuals). Given our lack of a set of known variants for base quality score recalibration, we followed the recommendations of Lou and Therkildsen (2022) to use stringent filters for excluding sites with mapping quality < 33 and base quality < 30. We included the "baq − 1" filter to measure base alignment quality and exclude false SNP calls due to misalignments around insertions and deletions (Li 2011).

Library preparation for sequencing
Initial testing of the Therkildsen and Palumbi (2017) and our modified method revealed that doubling the ratio of tagmentation enzyme to DNA, increasing the tagmentation time to 20 min, and decreasing the indexing PCR elongation time to 30 s resulted in maximization of fragments in the target distribution, both pre size selection (Fig. 2a) and post size selection (Fig. 2b).
Using these modified methods, we successfully prepared one library of 50 individuals from American Redstart feather DNA. The library had a final concentration of 2.56 ng/μL, a molarity of 8.56 nM, and an average library size of 453 bp.

Bioinformatic analysis and quality checking
The feather sequence data were mapped to a reference genome that had 1,176,890,789 bases. The mean mapped depth per individual was 1.3x and this corresponded to a mean of 65.5% of the reference genome having at least one Fig. 2 Tapestation 2200 gel of pooled feather library prior to final size selection (a) and after final size selection (b). Yellow bands show preferred fragment range (320-500 bp) for HiSeq 4000 as recommended by Novogene 1 3 read (i.e. sequencing breadth) per individual. Sequencing breadth nearly doubled from the individual with the lowest coverage depth of 0.6x (41.0% breadth) to the individual with the highest coverage depth of 2.0x (79.0% breadth; Fig. 3a). The patterns of sequencing depth versus breadth in our data very similarly reflected the results from Therkildsen and Palumbi (2017), in which they calculated a mean mapped depth of 1.3 x and breadth of 66% across 876 individuals. Reads were successfully mapped to the reference genome consistently across individuals with a mean of 88.9% (range 85.9% to 90.0%). The CollectWgsMetrics tool in Picard excluded a mean of 35.7% (range 30.1% to 41.2%) of bases due to the quality filters of base quality, mapping quality, unpaired reads, and duplicate reads (Fig. 3b). Sequencing depth was positively associated with the starting sample DNA quality (F-statistic 7.51 on 1 and 48 degrees of freedom, p value = 0.009, adjusted R-squared = 0.12) but not associated with the final sample DNA quality after standardization (F-statistic 1.58 on 1 and 48 degrees of freedom, p value = 0.22, adjusted R-squared = 0.01). However, the positive association between sequencing depth and starting sample DNA quality was weak, and even the sample with the smallest quantity of starting DNA (0.48 ng/μL) had mean sequencing coverage of 1.2x.
ANGSD identified a total of 5,740,312 SNPs with a global minor allele frequency of > 5% across all samples. The large number of SNPs detected are suitable data for a variety of downstream analyses that can incorporate the genotype probabilities outputted by ANGSD. Depending on the population sample size, data with similar low coverage would be suitable for applications relevant to conservation genetics such as estimating population structure, population genetic diversity, and genotype-environment association methods (Lou et al. 2021).

Discussion
In avian conservation genetics, researchers must often work from DNA samples with low yield, especially in the case of noninvasive or minimally invasive sampling. These types of samples can be a challenge to use with many genomic techniques, such as WGS, which typically require high input DNA quantities. Yet, collecting high DNA yield samples (like blood and tissue) can present ethical, logistical, and financial roadblocks.
The potential knowledge gained from utilizing WGS and similar genomic tools can provide insights not yet achievable from other methods. WGS can be a powerful tool for management in conservation efforts but, until recently, has been challenging and or cost-prohibitive when working with low total gDNA samples such as feathers. Here we have described a method for cost-effective WGS library preparation that can be used for feather samples. Results suggest that, with the modifications described above, one can successfully produce high quality sequence data from DNA with input as low as 0.48 ng, for a fraction of the cost of traditional library preparation methods. More specifically with this method, we were able to prepare samples for approximately $15 each, from DNA extraction to final QC prior to sequencing. We acknowledge that this exact cost will vary between institutions and vendor contracts. Our method will allow avian conservation biologists to apply WGS methods to previously underutilized sample types, like feathers. Another potential challenge of using feather DNA with WGS methods is biased or incomplete amplification of the genome due to the low number of copies of the entire genome present (Meynert et al. 2014). To assess this, we looked at the breadth of coverage in the final sequences in relation to the sequencing depth. Our mean breadth of genome coverage of 66% with a mean sequencing depth of 1.3x was nearly identical to the results of Therkildsen and Palumbi (2017) comparing sequence data from 876 Atlantic silversides (Menidia menidia) to a transcriptome. Sequencing breadth nearly doubled from 41% in our sample with smallest depth (0.6x) to 79% in our sample with highest depth (2.0x). Similarly, Therkildsen and Palumbi (2017) demonstrated that their samples with > 3.5x depth achieved a mean sequencing breadth of > 90%. Interestingly, the sequencing depth of our samples was positively associated with the starting DNA quantity of the samples but not the final DNA quantity. While some variation in sequencing depth of data is expected, a large amount of variation may result in patterns of genetic variation that confound biological patterns (i.e. batch effects; Leek et al. 2010). Batch effects can be particularly impactful for low-coverage WGS and should be addressed based on the particular source of the effect (Lou and Therkildsen 2022). The stringent filtering on base and mapping quality during SNP calling, as employed in our study, is one such method to mitigate batch effects (Lou and Therkildsen 2022). SNP calling in our study resulted in the identification of over 5 million SNPs, providing a data set suitable for a wide-array of conservation genetic applications such as, but not limited to, identifying genetic differentiation, population structure, and genotypeenvironment associations. Analyses like these can be performed with low per-sample coverage as long as population sample size is large enough (e.g. > 10 individuals per population; Lou et al. 2021). With even higher per-sample coverage (e.g. > 2x), genotype imputation becomes a viable option for inferring missing data (Lou et al. 2021). A discussion of suitable analyses in relation to sampling size and sequencing coverage can be found in Lou et al. (2021). Overall, our results suggest that when our method is employed, libraries prepared from feathers produce sequence data of sufficient quality for conservation genomics applications such as identifying relatedness and patterns of homozygosity for conservation breeding programs.
When applying this protocol to other species and sample types, researchers may want to think about a few important considerations. First, the genome size of the study organism should be considered. The protocol presented herein is an excellent option for use with organisms with a small genome, as the protocol was optimized using DNA from birds which on average have a genome size of 1.1 Gb. For species with larger genomes, more sequencing is required to achieve the same level of coverage, and therefore may require optimized methods and will have a relatively higher cost as well. Additionally, the ratio of tagmentation enzyme to input DNA may need to be adjusted in order to maintain a similar average fragment size. The tagmentation enzyme is one of the most expensive components of this method (aside from extraction), so increasing the amount of enzyme per sample would also increase the per sample cost. Second, we have also tested this method with other sample types, including avian blood and tissue. These different sample types can be run on the same sequencing run, but researchers may want to take additional steps to ensure equal representation of the libraries. Specifically, performing a double size selection on the individual libraries prior to quantification and pooling can help ensure that the most accurate concentrations are used when pooling individuals for sequencing and, therefore, the most equal sequencing effort per sample will occur (unpublished data). Third, the second double size selection on the pooled library, although found to be beneficial by the authors, does present a risk of additional loss of DNA. In situations where this poses a serious concern and caution is preferred, a tapestation or other form of visual quality control can be used first to determine whether the concentration and average fragment length and distribution satisfies sequencing requirements or whether an additional double size selection would improve the distribution. If the distribution is satisfactory but the concentration is too low, other concentrating techniques could be utilized, such as speed vacs or a simple bead concentration step. Overall, with the aforementioned modifications taken into account, we believe the method presented here could be applied more broadly to a multitude of taxa.
Here we present a cost-effective method for producing WGS libraries using DNA from feathers, which are minimally-invasively collected samples. By providing an efficient, cost-effective WGS method for DNA from feathers, we hope avian conservation management efforts will be able to better take advantage of the applications WGS can provide for enhancing management efforts.