Animals and tissues
Tissue samples of six pig breeds, BER, DUR, LAN, YOR, KNP, and SNU were collected and stored at -80°C in previous studies [35, 62]. Eight individuals from each breed were randomly selected. All experiments were approved and performed in accordance with the guidelines and regulations set by the Institute of Animal Care and Use Committee and the Center for Research Ethics of Konkuk University.
Preparation of genomic DNA
Genomic DNA was extracted from tissues (0.5 g) using a standard protocol. Briefly, tissues were incubated with 700 µL lysis buffer (50 mM Tris pH 8.0, 0.1 M EDTA pH 8.0, 0.5% (w/v) sodium dodecyl sulfate, 20 µg/mL DNase-free pancreatic RNase) and 20 µL of proteinase K (20 mg/mL) at 50°C overnight. Subsequently, gDNA was purified by phenol: chloroform: isoamyl alcohol (pH 8.0) extraction, precipitated using ethanol, and dissolved in 50 mM Tris-EDTA buffer (pH 8.0).
Determination of syntenic relationship
The genome assemblies and annotations used in this study were GRCh38.p14, Sscrofa11.1, ARS-UCD2.0, and GRCm39 for humans, pigs, cattle, and mice, respectively. The construction of a synteny plot was carried out using the Python package “pyGenomeViz” (github.com/moshi4/pyGenomeViz) on Python version 3.8.15 (www.python.org/).
Primer design
Primers that specifically amplify the entire coding sequence (CDS) of each of the nine MHC-linked OR genes were designed using NCBI Primer-BLAST (www.ncbi.nlm.nih.gov/tools/primer-blast/). The 500-bp upstream and downstream regions of the selected OR genes, based on the pig genome assembly (NC_010449.5, Sscrofa11.1), were used as queries. The maximum product size was set to 2500 bp. Primer sites containing nucleotide variations reported for dbSNPs (www.ncbi.nlm.nih.gov/snp/) were excluded. Off-target or multiple-target amplification of the primers was performed using BLAST analysis. Primer design was repeated until the optimal primers for the specific amplification of the target ORs were obtained. Primer information for the analysis of SLA-1, -2, and − 3 was based on previous studies [35–37] and is described in Table 1.
Polymerase chain reaction and sequencing
For a 10 µL reaction for nine OR genes including LOC100514111, LOC100516618, LOC100157348, LOC100515036, LOC100156552, LOC100522686, LOC100516811, OLF42-3, OLF42-1, the reaction mixture consisted of 1 µM of locus-specific primers (Table 1), 250 µM dNTPs, 1 unit of Supertherm™ Taq DNA polymerase (JMR Holdings, Kent, UK), 10× reaction buffer containing 15 mM MgCl2, and 25 ng of DNA. The amplification reaction was conducted on an ABI9700 thermocycler (Applied Biosystems, Foster City, CA) with an initial pre-denaturation step at 94°C for 3 minutes, followed by 30 cycles of denaturation at 94°C for 30 seconds, annealing at the specific primer annealing temperature (Table 1) for 45 seconds, and elongation at 72°C for 90 seconds. Subsequently, the final elongation was conducted at 72°C for 10 min. The amplicons were electrophoresed on 1% agarose gel to confirm target amplification. To sequence OR amplicons, sequencing primers with lengths of 15–20 bp were designed at conserved regions in both the forward and reverse directions with an overlap to cover the entire CDS of each OR gene using the Primer Designer of CLC Main Workbench version 7.8.1 (CLC bio, Aarhus, Denmark) (Table 1). Before sequencing, 5 µL of PCR products were mixed with 0.25 U of shrimp alkaline phosphatase (USB Corporation, Cleveland, OH), 15 U of exonuclease I (Fermentas, Massachusetts, USA), and incubated at 37 ℃ for 30 min. The sequencing reaction was performed using the Applied Biosystems BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Massachusetts, USA) with 2 pmol of a sequencing primer under the conditions of pre-denaturation at 96°C for 1 min, 25 cycles of 96°C for 10 s, 50°C for 5 s, and 60°C for 4 min. The reaction products were purified using ethanol precipitation, resuspended in 10 µL of Hi-Di™ Formamide (Applied Biosystems, Massachusetts, USA), and analyzed on an ABI3730 DNA Analyzer (Applied Biosystems). Specific amplification and sequencing of SLA-1, -2, and − 3 were conducted as previously described (Table 1) [35–37].
Allelic differentiation and haplotype determination
In principle, alleles that exist as homozygotes in sequence-based typing are considered novel alleles. Alleles that existed as heterozygotes in the typing results were separated into individual alleles through TA cloning using pGEM®-T Easy Vector Systems (Promega, Wisconsin, USA). For cloning, the ligation products were transformed into DH10B competent cells (Thermo Fisher Scientific, Waltham, MA, USA) using electrotransformation. Target inserts were amplified in 10 µL of colony PCR mixture containing a piece of a single bacterial colony as a DNA template, 1 µM of T7 and SP6 universal primers (Table 1), 250 µM dNTP, 1U of Supertherm™ Taq DNA polymerase (JMR Holdings, Kent, UK), and 10× reaction buffer with 15 mM MgCl2 (JMR Holdings, Kent, UK). The thermal profile for the colony PCR consisted of 5 min of pre-denaturation at 94°C, 30 cycles of denaturation at 94°C for 30 s, primer annealing at 50°C for 30 s, and elongation at 72°C for 60 s, followed by a final elongation at 72°C for 7 min. The PCR amplicons were sequenced using primers specific to each OR gene (Table 1). At least eight independent clones were sequenced for allelic determination using the T7 and SP6 universal primers. Allele differentiation for SLA-1, -2, and − 3 was conducted as described in previous studies [35–37]. Haplotype determination of SLA- and SLA-linked OR genes was conducted using the haplotype inference function, employing the EM algorithm provided in Arlequin version 3.5.2.2 [63]. The detailed procedure is shown in Supplementary Fig. 1.
Calculation of genetic diversity
The observed (Ho) and expected heterozygosities (He) of the loci were calculated using Arlequin version 3.5.2.2 [63]. Data normality and equality in variance distribution for the observed indices were tested using the Shapiro-Wilk normality test and an equal-variance test [64, 65]. Differences in the average values of the observed indices were tested using the Student’s t-test [66].
Population genetic analyses
LD index (ε) between loci was calculated after allele differentiation of genotyping results using eLD R script [67] on R version 4.1.2 (www.r-project.org). To generate an MSN plot, haplotype data was prepared for R library “adegenet” version 2.1.6 [68], and generation of a distance matrix and visualization was conducted using R library “poppr” version 2.9.4 [69]. A dataset for PCA was prepared using the genotype data of nine OR and three SLA class I genes from 48 individuals of six pig breeds, using the adegenet R package. PCA was conducted using the dudi.pca function of the R library “ade4” version 1.7.19 [70].
Estimation of haplotype breakpoints
To generate input nucleotide sequences for haplotype codon alignment, stop codons at the end of the full-length coding sequences (CDS) for 11 genes were removed, and the sequences were sequentially connected to produce a single sequence contig relative to the chromosomal order, except LOC100516811, a pseudogene. The input sequences were aligned using the codon alignment function of MUSCLE in MEGA version 11.0.10 [71]. The generated codon alignment was used as an input file for RDP v4.101 [72]. Haplotype breakpoints were deduced using RDP, GENECONV, MaxChi, BootScan, and SiScan in RDP software. Statistical significance of the breakpoints was tested using a breakpoint P distribution plot (1000 permutations and window size = 200bp). Breakpoints with an estimated P-value < 0.01 were suggested as breakpoint hotspots.
Codon selection test and LD calculation
The haplotype phylogenetic tree was constructed from the haplotype codon alignment using the "Create Tree" feature of CLC Main Workbench version 7.8.1 (CLC bio, Aarhus, Denmark). The neighbor-joining method and Kimura 80 nucleotide substitution model were applied with 5000 bootstrap replications. The created haplotype phylogenetic tree was unrooted using the R library “ape” version 5.6.2. [73] and was used as the input file for CodeML of pamlX version 1.3.1 [74]. All analysis options were set to default except for codon frequency, which was configured using the 2:F3ⅹ4 model. To test selection signatures of codons under Random-sites models, we used site models to allow the dN/dS (ω) ratio to vary among sites using Models 0, 1, 2, 7, and 8 under different assumptions on the distribution of ω ratio [74]. Empirical Bayes analysis using Models 2 and 8 was used to deduce the positively selected codons [75]. Codons with a confidence level > 95% were identified as positively selected codons. LD values between haplotype SNPs were estimated using Haploview version 4.2 from the haplotype codon alignment [76].
Comparison of OR expression
Expression data for SLA-linked OR genes were obtained from the NCBI Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo/; accession numbers GSE171756 (pig testis) and GSE197184 (pig olfactory epithelium (OE)). Expression data for human ORs were retrieved from the same database with accession numbers GSE30611 (human testis) and GSE80249 (human OE). Because the OR expression data for porcine OE were available only as read counts, the values were converted to relative FPKM values for expression comparison using the following formula:
$$\:\text{O}\text{R}\:\text{g}\text{e}\text{n}\text{e}\:\text{F}\text{P}\text{K}\text{M}\:\text{i}\text{n}\:\text{O}\text{E}=\:\text{O}\text{R}\:\text{g}\text{e}\text{n}\text{e}\:\text{r}\text{e}\text{a}\text{d}\:\text{c}\text{o}\text{u}\text{n}\text{t}\:\text{i}\text{n}\:\text{O}\text{E}\:\times\:\left(\frac{\text{O}\text{R}\:\text{g}\text{e}\text{n}\text{e}\:\text{F}\text{P}\text{K}\text{M}\:\text{i}\text{n}\:\text{t}\text{e}\text{s}\text{t}\text{i}\text{s}}{\text{O}\text{R}\:\text{g}\text{e}\text{n}\text{e}\:\text{r}\text{e}\text{a}\text{d}\:\text{c}\text{o}\text{u}\text{n}\text{t}\:\text{i}\text{n}\:\text{t}\text{e}\text{s}\text{t}\text{i}\text{s}\text{}}\right)\:$$