2.1. Sampling and DNA extraction
Peripheral blood samples were collected by local veterinarian from 59 wildcat and domestic cats with the consent of the local village administrations (Additional file 8: Table S1). Among these samples, 40 individuals were collected from villages in off-shore islands making Lamu complex in the EAC, Kenya after obtaining authorization for research from the Department of Veterinary Service of Kenya (RES/POL/VOL.XX.VII/162); 18 Iranian feral cats (including eight samples from the north and 10 samples from the south of the country) were collected. All experimental procedures were approved by the Animal Care and Use Committee of Kunming Institute of Zoology (SMKX2017007). The methods were carried out in accordance with the approved guideline. Additional single tissue was extracted from Wild cat captured by B. A. during small carnivore survey in Central Kenya. This particular animal was collected as voucher specimen and deposited in National Museums of Kenya.
2.2. Analysis of mitochondrial genome sequences
Genomic DNA was extracted from whole blood of 59 wildcat and domestic cats by the standard phenol/chloroform method. Protocols for PCR amplification and sequencing of ND5 & ND6 and D-loop region of the mitochondrial DNA (mtDNA) genome are provided in appendix (see Appendix-supplementary material and methods for details). Both light and heavy chains were sequenced. Electropherograms for the sequences were visualized, edited and aligned by SEQMAN PRO of LASERGENE 7.1.0 (DNAStar, USA) against the reference sequence NC_001700[13]. The variants in the ND5 & ND6 and D-loop sequences were scored relative to the reference sequence NC_001700 [13].
2.2.1. ND5 & ND6
The ND5 and ND6 comparative diversity study involved 143 DNA sequences [2] downloaded from GenBank [GenBank: information is available Additional file 9: Table S2]. All 199 sequences (56 de novo [GenBank: MN313723 – MN313781] and 143 from GenBank range 2300 - 2527 bp) were aligned and trimmed to 2363 bp for analysis. The 199 ND5 & ND6 sequences were initially aligned with CLUSTALX 2.1[19] and then checked by eye. Comparisons of sequences and identification of haplotypes were performed using DNASP 5.10.1[20]. The model of substitution and related parameters were determined through Bayesian information criterion [21] in JMODELTEST 2.1.4[22]. Maximum likelihood (ML) tree was constructed from the 199 sequences data (Additional file 9: Table S2) to visualize overall similarity using MEGA6 [23] with TN93+I+G model of substitution selected by AIC in JMODELTEST 2.1.4. To discern possible genetic relationship between EAC-Lamu and Iranian cats with other Near East and Central Asia cats, ML tree was constructed based on 159 sequences (56 de novo and 103 from Group IV). A median-joining network[ 24] for the 122 haplotypes from 159 sequence data (56 de novo and 103 from GenBank) of wild and domestic cat samples from EAC-Lamu, Iran and group IV was constructed with NETWORK 4.6.11 (http://www.fluxus-engineering.com).
2.2.2. D-loop
Similarly, for comparative D-loop mtDNA diversity study for the evidence of gene flow, 75 previously sequenced cats DNA sequences[25, 26] were retrieved from GenBank [GenBank: AJ441317-AJ441319, AJ456977, AF348642, AB480177-AB480198, AB121148-AB121194] (Additional file 9: Table S2). All 129 sequences (54 de novo [GenBank: MH513143 – MH513196] and 75 from GenBank range 415-546bp) were aligned and trimmed to 417 bp for analysis.
The 129 D-loop sequences were initially aligned with CLUSTALX 2.1 [19] and then checked by eye. Comparisons of sequences and identification of haplotypes were performed using DNASP 5.10.1[20]. The model of substitution and related parameters were determined through Bayesian information criterion [21] in JMODELTEST 2.1.4 [22]. ML tree was constructed from the 129 D-loop data (Additional file 9: Table S2) to visualize overall similarity using MEGA6 [23] with TrN+G model of substitution which was the best model estimated by JMODELTEST 2.1.4. A median-joining network [24] for the 82 wild and domestic cat haplotypes was constructed with NETWORK 4.6.11 (http://www.fluxus-engineering.com).
2.3. Analysis of whole genomes
According to the ND5 & ND6 network, eight samples (representing four EAC-Lamu and three Iranian samples sharing similar haplotypes; including the wild cat from Central Kenya) were selected for whole genome resequencing. The sequencing data from this study have been submitted to the Genome Sequence Archive (GSA, http://gsa.big.ac.cn/) under project number XXXXXXXX. Details on whole genome sequencing, sequence data preprocessing and variant calling are in Appendix-supplementary material and methods. We also incorporated 11 published whole-genome sequencing data of domestic cat (consisting of three Persian cat, two American cats, two Abyssinian cats, three DovenRex cats and one Bengal cat) (http://felinegenetics.missouri.edu/99lives)[27](Additional file 10: Table S3). These samples cover ranges of domestic cat breeds from different regions. Maximum-likelihood phylogenetic tree was built using the 19 cat genome by FastTree 2[28] (Additional file 10: Table S3).
2.4. Estimation of Demographic History
We used the Pairwise Sequentially Markovian Coalescence (PSMC) methods developed by Li and Durbin [29] to infer trajectory of the ancestral population of both wild and domestic cat genomes in response to Quaternary climatic change. PSMC has high false-negative rates at low depth, resulting in a systematic underestimation of true event times. Therefore, we selected resequencing data of cat genomes with the highest read depth (Additional file 10: Table S3). We further performed G-PhoCS [30] to infer the population history of wild cat, EAC-Lamu and Iranian cats cat as follows. First, we split the whole-genome into segments with 1kb length. Next, we removed the regions with gaps less than 50 bases. Segments located in repeats regions were removed. Finally, we filtered the regions 50kb close with genes. Due to the long running time, we randomly selected 5,000 segments from all neutral regions. 10,000,000 iterations were then performed.