Pedigree and patients
A congenital cataract family consisted of five patients affected with NHS, including the proband (III:4) and 4 family members (I2: II:3, II:7 and III:7), was recruited from Shenzhen, Guangdong Province, China (Fig. 1). All patients underwent detailed medical history collection and physical examination. Disease history, family history, and pregnancy history, ocular history, Snellen visual acuity, best-corrected visual acuity, non-contact tonometry, along with slit-lamp examination, anterior segment examination and photography, and posterior segment examination of family members were recorded. The study protocol was approved by the Ethics Committee of Shenzhen Eye Hospital and conducted according to the standards of the Declaration of Helsinki. All participants provided written informed consent.
DNA extraction and target region sequencing
DNA extraction: Peripheral blood samples (4–5 ml) were collected in EDTA anticoagulant vacuum tubes and stored at -20°C. Genomic DNA was extracted using [provide the information for the kit] and determined using a NanodropTM 2000 spectrophotometer (Thermo Fisher Scientific Co. Ltd. Boston, MA).
Genomic library construction: An ultrasonic DNA oscillator broke DNA into 180–280 base pair fragments, the adaptors on each end were then ligated, end repair and phosphorylation was conducted, and then the ligation products were purified using magnetic beads. After purification via agarose gel electrophoresis, suitable fragments were enriched by PCR amplification.
Target region gene capture: The gene fragments were hybridized to the probe (whole-exon P039-Exome probes), and adsorbed to the beads through biotin and streptavidin-biotin. The nonspecific binding DNA fragments were then washed out and the target gene was enriched.
NextSeq500 high-throughput sequencing: All sequencing was performed on a NextSeq500 (Illumina, SanDiego, CA) using bridge amplification and the Flowcell sequencing chip (Illumina). The NextSeq500 performs intelligent cycle imaging, in which individual cycle reactions can be extended with only one correct complementary base, and the base species are confirmed on the basis of distinct fluorescent signals followed by multiple cycles to yield the complete nucleic acid sequence.
Bioinformatics analysis
Primary sequencing data were aligned to the human reference genome (hg19) after filtering out low-quality reads and potential adaptor contamination sequences.Sequences were aligned using Burrows-Wheeler Aligner software(Bwa:bwa-0.7.10;http://bio-bwa.sourceforge.net/). The collected data were processed through a standard information analysis pipeline (https://samtools.sourceforge.net/), including detection, annotation, and analysis of single nucleotide polymorphisms (SNPs) as well as insertion and deletion mutations. At the same time, sequencing data were analyzed to assess whether the sequencing depth was sufficient for transcriptome coverage. The GATK (https://www.broadinstitute.org/gatk/) genome analysis web tool library was used to retrieve the number of SNPs and missing marker loci. The reference databases were the human HAPMAP, dbSNP138, Exome Sequencing Project, and Exome Aggregation Consortium databases. Candidate causative genes were screened by stepwise filtering.
Validation of the candidate gene mutation by Sanger sequencing
The candidate NHS mutation was verified using Sanger sequencing. The coding regions of the gene were amplified and sequenced. PCR primers were designed using Primer 3.0 online software (Applied Biosystems ABI, Foster City, USA). The sequences of the forward primer and the reverse primer were
5'-TTCGCCAAGCGGATCGTGGA-3' and 5'-TTAGGGTCAAGCGTGCTGAGGA-3' respectively. Sanger sequencing was also performed for all family members to determine whether this gene mutation was co-separated with the disease.
Functional prediction of the mutation protein
SOPMA was used to calculate the secondary structure features of the mutant protein ([email protected] : SOPMA secondary structure prediction (ibcp.fr)). PSORTII was used to predict the subcellular location of the mutant protein (https://psort.hgc.jp/form2.html).The protein structure were predicted with Phyre2 protein fold recognition server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index).