2.1. Ethical statement
All animal study protocols were conducted in strict adherence to the Ethical Management of Laboratory Animal Welfare standards established by Zunyi Medical University. The animal experiment was approved by the Animal Center of Zunyi Medical University (Approval number: [2016] 2-059) and was carried out in accordance with ARRIVE guidelines.
2.2. Breeding of the Porcupine Family Line
The porcupines used in the experiment were acquired from Muxinyuan Ecological Agriculture Development Co., Ltd., situated in Suiyang County, Zunyi City, Guizhou Province. The facility holds a wild animal breeding license numbered Suidong 2013 Xunfan (010). For the experiment, we utilized 4 heterozygous mutant male porcupines and 8 wild type female porcupines as the F0 generation. These porcupines were paired for breeding in a 1 male to 2 female ratio to establish the family lineage. The offspring, comprising the F1 generation, were initially screened for traits, and those partially showing albino phenotypes in coat, skin, and sclera color were identified as heterozygous mutants. By interbreeding the F1 generation's heterozygous mutant individuals. The offspring of F2 generation were preliminarily screened, and the offspring showing complete albinism in their coat, spines, skin, and sclera, as well as a microphthalmia phenotype were identified as homozygous mutant. A partial genetic pedigree of the porcupine family is shown in Figure 1.
2.3. Collection of Skin and Inner Ear Tissues
Specimens were collected from 3 wild type porcupines, 3 heterozygous mutant porcupines, and 3 homozygous mutant porcupines, which were anesthetized with intramuscular injections of "Su-Mian-Xin," a specialized anesthetic provided by Dunhua Shengda Animal Pharmaceutical Co., Ltd., Dunhua, China, at a dosage of 0.1 ml/kg. Following anesthesia, the subjects were promptly decapitated, and with ear notchers, we excised 2×2 cm sections of auricle skin tissue. We then removed the surface hair with a scalpel and submerged the tissue samples in 4% paraformaldehyde before placing them in a 4°C refrigerator overnight for fixation.
2.4. Histology
The fixed tissues were rinsed in running water for several hours to wash away fixative residues, followed by dehydration through a graded ethanol series (70%, 80% and 90%) to remove moisture. Subsequently, the tissues were cleared in xylene, rendering them transparent and ready for paraffin embedding. The tissue blocks were then sectioned at 4 μm thickness using a microtome. These paraffin sections, placed on glass slides, were deparaffinized in xylene and rehydrated through a descending ethanol series. The sections were sequentially stained with hematoxylin, to target the nuclei, and eosin, to stain the cytoplasm and extracellular components. Following staining, the slides were dehydrated using an ascending ethanol series, cleared in xylene, and then mounted with a xylene-based adhesive under a coverslip. Finally, pigment deposition was observed under a microscope.
2.5. Auditory Brainstem Response (ABR) Assessment
ABR assessments were performed on wild type, heterozygous mutant porcupines and homozygous mutant porcupines using an auditory function tester within a soundproof and shielded room. Silver needle electrodes were utilized where the recording electrode was positioned along the line connecting the vertex of the porcupine's skull to the anterior edge of both ears. The reference electrode was inserted into the earlobe, while the ground electrode was secured at the tip of the snout. Stimulus tones comprised of brief clicks and short pure tones (tone burst) at frequencies of 0.5 kHz, 1 kHz, 4 kHz, 8 kHz, 16 kHz, and 32 kHz. The sound was emitted at a distance of 2 cm from the porcupine's ear, with each acoustic stimulus lasting 4 ms and subjected to a band-pass filter range of 300-3000 Hz. The short-tone stimulus operated at 12.1 beats/s with a duration of 0.1 ms, followed by a 4 ms short tone burst with rise and fall times of 0.5 ms. The evoked response potentials were amplified by a factor of 10,000 across an average of 1024 superimpositions. The apex stimulus sound intensity reached 120 dB SPL and was methodically attenuated from 120 dB SPL in decrements of 10 dB SPL until a consistent ABR waveform could no longer be elicited. The intensity was then increased in increments of 5 dB SPL to determine the threshold, which was defined as the lowest stimulus intensity at which consistent ABR waveforms were perceptible. This process was repeated three times, and the mean value was recorded as the ABR threshold for the porcupine14.
2.6. DNA Mixing Pool Construction
We constructed two offspring DNA pools based on phenotypic traits, distinguishing them into a heterozygous mutant group and a wild type group, rather than on genetic relationships. The parental generation comprised only one male heterozygous mutant individual. For the mutant group, auricular skin tissue samples were collected from 13 heterozygous mutant porcupines; for the normal group, samples were taken from 18 wild type individuals. Additionally, a sample was collected from one male heterozygous mutant individual. Each sample was placed in an individually labeled, sealed sampling tube. Tissue DNA was extracted using the Genomic DNA Extraction and Detection Kit from Tiangen Biochemical Technology Co., Ltd. (Lot: U9022), following the provided instructions meticulously. The samples were then sent to Beijing Novogene Technology Co., Ltd. for sequencing.
2.7. BSA Library Construction and Sequencing
A total amount of 1.5μg DNA per sample was used as input material for the DNA sample preparations. Sequencing libraries were generated using Truseq Nano DNA HT Sample preparation Kit following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, the DNA sample was fragmented by sonication to a size of 350bp, then DNA fragments were end polished, A-tailed, and ligated with the full-length adapter for Illumina sequencing with further PCR amplification. At last, PCR products were purified (AMPure XP system) and libraries were analyzed for size distribution by Agilent2100 Bioanalyzer and quantified using real-time PCR. These libraries constructed above were sequenced by Illumina HiSeq4000 platform and 150bp paired-end reads were generated with insert size around 350bp.
2.8. Mapping to reference genome
We first performed a quality assessment of the raw sequencing data, which included analyzing the error rate distribution and nucleotide content. Then, we removed the adapter sequences and filtered out reads of low quality and those with an N ratio (indicating bases that cannot be determined) greater than 10%. This process yielded clean reads suitable for subsequent analysis. The clean reads are mapped onto the reference genome (Mus musculus) employing the BWA software (Burrows-Wheeler Aligner)15. Furthermore, the command "rmdup" from SAMtools16 software is utilized to eliminate possible PCR duplicates. In scenarios where multiple read pairs share identical external coordinates, only the pair exhibiting the highest mapping quality is preserved. The basic information of the reference genome is shown in Table 1.
Table1. Basic information of reference genome.
Seq number
|
Total length
|
GC content(%)
|
Gap rate(%)
|
N50 length
|
N90 length
|
1136598
|
2,470,917,678
|
41.52
|
0.02
|
64,768
|
459
|
2.9. SNP and InDel Detection and Annotation
The variant identification process across all samples was performed using the Genome Analysis Toolkit (GATK) software's Unified Genotyper feature17. For SNPs, we applied the Variant Filtration parameter within GATK with the following configuration: --filter Expression set to "QD < 4.0 || FS > 60.0 || MQ < 40.0", -G_filter to "GQ<20", and –cluster WindowSize specified as 4.InDels were filtered using Variant Filtration parameters (settings: --filter Expression "QD < 4.0 || FS > 200.0 || Read PosRankSum < -20.0 || Inbreeding Coeff < -0.8"). ANNOVAR18, an effective software tool, annotated SNPs or InDels based on GFF3 files of the reference genome, given its powerful annotation capabilities and international recognition, so we used it to annotate the results of our assay.
2.10. SNP and InDel Calculations
We used one parent's genotype (1f1) as a reference to statistically analyze the number of reads corresponding to this genotype or others within the offspring pools. This method involves using sequencing reads to enumerate the nucleotides at each genomic position. It then compares these nucleotides to the reference genome. For a given site, we count the reads in the progeny pool that match or differ from the reference genome. The SNP-index is then calculated by dividing the number of differing reads by the total number of reads at that site, providing a measure of variation at that locus. In this context, a completely identical SNP-index equals 0, while a completely different SNP-index equals 1. The calculation of the InDel-index followed a similar methodology to that of the SNP-index.
To mitigate background noise, sequencing and alignment errors, and improve gene positioning efficiency, we meticulously screened and filtered SNP/InDel and SNP/InDel-index data. The screening and filtering process adhered to the following criteria:
1. Screening for markers indicating parental homozygosity within the parent population.
2. Excluding sites with an SNP/InDel-index less than 0.3 and SNP depth below 7 in both offspring groups.
3. Filtering out sites with a missing SNP-index in any offspring group.
2.11. Differential Analysis of SNP and InDel
To visualize the differences between the two offspring pools, the △(SNP-index) = SNP-index (S2)-SNP-index (S1) was calculated. We set the window to 1Mb and the step size to 1kb, calculating the mean value of △(SNP-index) to assess the △(SNP-index) distribution. Alongside the population type and the number of offspring mixing pools, a 1,000 permutation test was conducted19. The calculation of the △(InDel-index) followed a similar methodology to that of the SNP-index.
2.12. Identify candidate genes based on SNP and InDel
In order not to ignore the influence of micro-effective quantitative trait loci (QTLs), SNP and InDel loci with significant differences in their respective indices between two offspring were selected across the genome. That is, sites where the SNP index or InDel index of offspring 2 are close to 1.0 are selected, and sites where the SNP index or InDel index of offspring 1 are close to 0.0 are selected. For these candidate polymorphic loci, annotation results were extracted using ANNOVAR, prioritizing genes containing loci that resulted in termination loss, termination gain, nonsynonymous mutations, or alternative splicing as candidate genes.
2.13. Transcriptome Sequencing
The auricle skin tissue from adult wild type porcupines was sized, washed thrice with PBS, and then promptly cut, placed into a sampling tube, and stored in liquid nitrogen. This tissue was subsequently sent to Shanghai Zhongke New Life Biotechnology Co., Ltd. for transcriptome sequencing. Total RNA from the sample was extracted using the TRIzol method. After passing the quality test, mRNA was enriched with magnetic beads containing Oligo(dT), and Fragmentation Buffer was added to cleave the mRNA into short fragments of 200-300 bp. The RNA sample was then used as a template to synthesize the first strand of cDNA using 6-base random primers and reverse transcriptase. Subsequently, the second strand of cDNA was synthesized using the first strand as a template. The double-stranded cDNA was purified using AMPure XP beads. This purified double-stranded cDNA underwent end repair, A-tailing, and ligation with sequencing adapters. The fragment size was selected with AMPure XP beads, followed by PCR amplification to generate the final cDNA library. After the insert fragment length and the effective concentration of the library had been tested and met the qualifications, sequencing was performed using the Illumina HiSeq platform. Raw image data procured from Illumina's high-throughput sequencing undergoes base calling with CASAVA, transforming them into sequenced reads.
2.14. Transcript Splicing and MITF gene annotation
The sequencing data were then subjected to quality assessment, including error rate distribution and nucleotide content analysis. Adapter sequences were trimmed, and reads with low quality and undetermined bases (N) were filtered out to obtain clean reads for subsequent analyses20. Using Trinity software, the gold standard for assembling Illumina short-read sequences, we constructed reference sequences from clean reads. This process, comprising the Inchworm, Chrysalis, and Butterfly stages21, resulted in the assembled transcript sequences being saved in the FASTA format. Since the porcupine genome sequence has not been annotated, we first annotated the candidate genes for porcupine pathogenicity. In this study, we found that the phenotypic characteristics of the mutant porcupine were very similar to those of human WS2, and the homozygous mutant individuals exhibited typical microphthalmia phenotypic characteristics. We speculate that the pathogenic gene in this porcupine family is likely to be MITF. So we selected MITF as the candidate gene for porcupine pathogenicity and annotated it. We searched for the human MITF gene sequence information on Genbank, then compared the spliced porcupine MITF transcript sequence with the human MITF gene sequence. The region on the porcupine MITF gene with the highest consistency with the human MITF gene exon sequence was defined as the porcupine MITF gene exon region, resulting in the annotation of the porcupine MITF gene.
2.15. Localization of Candidate Mutation Sites in MITF Gene
By comparing the annotation results of the porcupine MITF gene with the candidate mutation sites screened by the BSA method, the mutation site on the MITF gene was accurately located. For further analysis, 23 porcupines from the family were selected, comprising 17 heterozygous mutants and 6 wild types, all of whom provided ear skin tissue samples. After extracting DNA from these samples, we designed specific primers flanking the mutation sites previously identified in the MITF gene. Sanger sequencing, along with co-segregation analysis, confirmed the link between phenotype and genotype. The primer sequences used were as follows: Forward: 5'TTTTCCTGACCTGGGTAACGTTC3' and Reverse: 5'AATCTGTTTGCCAAGGAATCTC3'.