Investigation on gastric cancer susceptibility genes in Chinese early-onset diffuse gastric cancer

Background Diffuse gastric cancer (DGC) is known as gastric cancer with histological form of diffuse type morphology. Early onset DGC is an indicator of suspicious hereditary diffuse gastric cancer (HDGC). HDGC patient are often less sensitive to chemotherapy and suffer from highly invasive late stage malignancy with poor prognosis. Such hereditary cancer syndrome is closely related to deleterious CDH1 germline variant. In addition, potential pathogenic germline variants in other candidate genes were also observed in HDGC families. However, the profile of gastric cancer (GC) susceptibility gene in Chinese HDGC patients is yet to be elucidated. peripheral followed by Sequencing with average depth of at least 100X. Gastric cancer predisposing SNV based on allele frequencies performed identify potential predisposing variant candidates. was conducted on biopsies from This study suggested that CDH1 c.2165-1G>A may act as a gastric cancer predisposing variant. In addition, to further investigate molecular mechanisms of early-onset gastric cancer, one may consider 22 genes observed in this study. Furthermore, the inconclusive results in this study warrant future investigation on gastric cancer susceptibility gene discovery that cohort selection may require more stringent conditions. appropriate to identify CDH1 pathogenic variants in HDGC patients in East Asian countries with high incidence of gastric cancer. hereditary gastric exome sequencing signet-ring cell carcinoma International Gastric Cancer Linkage Consortium (IGCLC), Genome Analysis tool kit allele frequency pathogenic / likely pathogenic disease-causing mutations variant of uncertain


Abstract Background
Diffuse gastric cancer (DGC) is known as gastric cancer with histological form of diffuse type morphology. Early onset DGC is an indicator of suspicious hereditary diffuse gastric cancer (HDGC). HDGC patient are often less sensitive to chemotherapy and suffer from highly invasive late stage malignancy with poor prognosis. Such hereditary cancer syndrome is closely related to deleterious CDH1 germline variant. In addition, potential pathogenic germline variants in other candidate genes were also observed in HDGC families. However, the profile of gastric cancer (GC) susceptibility gene in Chinese HDGC patients is yet to be elucidated.

Methods
To investigate gastric cancer susceptibility genes in Chinese gastric cancer patients, we collected peripheral blood samples from 29 patients fulfilled both HDGC clinical diagnosis and genetic testing criteria updated on 2015 by IGCLC. Genomic DNA was extracted from peripheral blood followed by Whole Exome Sequencing (WES) with average sequencing depth of at least 100X. Gastric cancer predisposing SNV and Indel candidates were filtered based on population allele frequencies in public nucleotide polymorphism databases and pathogenic variant filtering process was performed to identify potential gastric cancer predisposing variant candidates. Immunohistochemistry was conducted on biopsies from patient who carries potential pathogenic CDH1 germline variant.

Results
In general, 336296 germline non-synonymous variants were detected from 29 GC patients. According to the pathogenic variant filtering process, 25 germline variant candidates in 22 genes were identified as gastric cancer predisposing variant candidates. In addition, a novel splice-site variant, c.2165-1G>A (NM_004360.4), in CDH1 was detected in a Chinese earlyonset HDGC patient who developed ovarian metastasis. Furthermore, IHC analysis on the ovarian cancer tissue from this patient demonstrated weakly to moderate staining of Ecadherin compared with positive control. Moreover, another two variants, CTNNA1

Background
Gastric cancer (GC) is the fifth most common malignancy and third leading cause of cancerrelated mortality worldwide (1, 2). Moreover, GC is the most prevalent cancer in eastern Asia including China (3). In general, most of the GCs are sporadic, whereas 1-3% is caused by inherited cancer predisposition syndromes (4). The best characterised inherited GC is hereditary diffuse gastric cancer (HDGC) which is diffuse-type with multiple foci of signetring cell carcinoma (SRCC) underneath an intact surface epithelium (5). Patients with this manifestation are likely to be highly invasive late stage cancers that are not sensitive to chemotherapy, and are often suffered from poor prognosis.
Previous studies demonstrated that germline mutations in CDH1 is one of the risk factors related with HDGC, due to reduced or even loss activity of normal protein functions of Ecadherin (5). CDH1 is a tumour suppressor gene which maps to chromosome 16q22.1 and consists of 16 exons. The genomic length of CDH1 is approximately 100kb, and it is transcribed into 4.5-kb mRNA followed by being translated to a 120kDa E-cadherin, a highly conserved transmembrane glycoprotein whose function is to maintain calcium-dependent cell-cell adhesion through association of cytosolic protein complex mainly formed by catenins (6,7). Therefore, susceptibility for gastric cancer in CDH1 mutation carriers is suggested to be attributed to aberrant E-cadherin function or protein loss, since a normal functioning E-cadherin is highly required by various signalling pathways and cross talks to maintain the balance of cell proliferation, motility and polarity (7). In addition, the estimated cumulative risk for CDH1 mutation carriers to develop GC by age 80 years is 67% for men and 83% for women (8). Therefore, screening for CDH1 germline pathogenic variants was included in the clinical criteria for HDGC diagnosis by the International Gastric Cancer Linkage Consortium (IGCLC) (9, 10).
The global incidence of gastric cancer is highly diverse and the reason for this geographic difference is remained to be elucidated. Additionally, such discrepancy remains in terms of incidence of identified CDH1 germline variants. There are more than 100 different germline variants characterised in CDH1 coding region and splice site region from families with different ethnic background worldwide (4,11,12). However, less CDH1 germline variants were detected from families in areas with middle/high gastric cancer incidence, namely Italy (Tuscany), Portugal (North), Japan, China, Korea, South America, Lithuania and Poland compared to low incidence countries (12). In addition, reports indicated that in countries of low incidence, the detection rate of pathogenic CDH1 variants in patients fulfilling the IGCLC screening criteria ranges from 10-18%, whereas such number dropped to less than 10% in high-incidence areas (4) This may due to a potential genetic difference between diverse ethnic groups that attributes to different genetic background and environmental effects, and may suggest that the direct implementation of clinical management guidelines for HDGC, primarily based on Caucasian population, may not be appropriate for Asian population.
Considering that China contributes about 42% gastric cancer incidence worldwide, investigations in Chinese population is in urgent need. Furthermore, not all families fulfilling these criteria have mutations in CDH1, indicating the existence of other DGC predisposition genes (4). For instance, a germline truncating variant in CTNNA1 was present in two invasive diffuse gastric cancer patients (13), and Hansford et al. suggested that apart from pathogenic germline variant in CTNNA1, a gene involved in E-cadherin signalling, other genes such as BRCA2, STK11, SDHB, PRSS1, ATM, MSR1 and PALB2 may also potentially contribute to gastric cancer susceptibility (11). In order to improve the clinical management of such malignancy, studies on exploiting novel pathogenic variations in CDH1 and new susceptibility genes for HDGC in Chinese population are required to further elucidate the molecular mechanism of HDGC. Given that early-onset gastric cancers showed advanced stage of disease with high proportion of poorly differentiated carcinomas (14), and that early age of diagnosis potentially indicate presence of familial and hereditary cancer which may be attributable to genetic factors, therefore, investigating on such group of cancer patients may assist to identify susceptibility genes to gastric cancer.
Our study was performed to aid the need of clinical data from Chinese population based on early onset Chinese GC patients, and to clarify potential susceptibility gene candidates, as well as underpinning potential association between genotypic information and clinical features in this group of patients.

Patient information
A total of 29 patients included in this study were recruited from Peking University Caner Hospital. To investigate pathogenic variations in CDH1 and potential susceptibility genes in Chinese HDGC patients, peripheral blood samples were collected from patients. Family history was self-reported by patients, and clinicopathological information was collected from electronic medical record system in Peking University Caner Hospital (Table 1). In this cohort, 28 patients were diagnosed as DGC with onset age younger than 40-year, whereas one patient, a first degree relative of one of the 28 patients, was confirmed diagnosis of DGC at age of 49 years. All of the 29 patients fulfilled HDGC clinical diagnosis criteria updated on 2015 by IGCLC. Written consent was signed by patients and the study was approved by the Medical Ethnic Committee, Beijing Cancer Hospital. Whole exome sequencing and potential pathogenic germline variant filtering process Genomic DNA was extracted from peripheral blood samples and whole exome regions were captured followed by library enrichment. Qualified library was sequenced for 150bp paired-end Whole Exome Sequencing with average sequencing depth of at least 100X using the Illumina HiSeq 2500 platform. Reads were aligned to the reference human genome GRCh37 and Genome Analysis tool kit (GATK) was used for realignment and base recalibration. Variant Call Format file was further generated by a standard pipeline.
Non-synonymous SNV and InDel variants with allele frequency (AF) below 0.01 or not included in public nucleotide polymorphism databases (1000_AF, 1000_EAS, NHLBI-ESP_AA/EA, gnomAD_AF, gnomAD_EAS) were preserved. Pathogenic variant candidates were then filtered by process indicated as follows ( Figure 1): 1. Variants classified as pathogenic / likely pathogenic (P/LP) in ClinVar or as disease-causing mutations (DM) in HGMD were directly characterised as candidate variants; 2. SNVs predicted to both have the highest-level impact on protein function by two in silico tools (SIFT, Polyphen2) were retained; 3. The maintained SNVs, InDels and splice site variants were filtered by candidate gene set consist of gastric cancer susceptibility genes indicated in literatures and in NCCN guidelines ( Table 2); 4. The filtered splice site variants were subsequently tested by splicing signal prediction tool (HSF), and the one predicted to be "disturbing splicing" was preserved as deleterious variant candidates. SDHB likely pathogenic missense variant in SDHB, a Cowden-like syndrome associated gene, detected in HDGC patient with early-onset lobular breast cancer history (11,18) PRSS1 possible pathogenic truncating mutation discovered in PRSS1, a gene associated with hereditary pancreatitis, in HDGC patient (11,19,20) ATM potential deleterious mutations found in ATM in HDGC and FGC patients (11,21,22) MSR1 possible pathogenic truncating mutation discovered in MSR1, a gene associated with increased risk of esophageal cancer and prostate cancer, in HDGC patients (11,23,24) PALB2 potential pathogenic variants found in HDGC patients (11,15,25) TP53 pathogenic variant in TP53 observed in an early-onset diffuse type gastric cancer patient with breast cancer family history (22) ROHA RHOA germline variants found in DGCs with presence of poorly differentiated adenocarcinomas (26)

Figure 1. Process of identifying potential pathogenic germline variant candidates
False positive variants were excluded and 5 reference allele frequency databases (1000_AF, 1000_EAS, NHLBI-ESP_AA/EA, gnomAD_AF, gnomAD_EAS) were used to initialise nonsynonymous variants filtering. Variants with allele frequency less than 0.01 or not included in reference database were preserved. The preserved variants were firstly filtered by ClinVar and HGMD, and the one categorised P/LP in ClinVar or DM in HGMD was directly classified as pathogenic germline variant candidate. The remaining unidentified variants were then sorted as SNV, InDel and splice-site variant. Furthermore, SNVs with highest-level impact on protein function, predicted as deleterious in SIFT and probably damaging in Polyphen2, were retained. InDels, splice variants and the retained SNVs, were further filtered by candidate gene set constitute of gastric cancer predisposing genes reported in literatures and in NCCN guidelines (Table 2). Splicing signal prediction tool (HSF) was employed to analyse the potential effect of filtered splice site variants to further assist variant candidate identification. (P: pathogenic, LP: likely pathogenic, DM: disease-causing mutations)

Immunohistochemistry
The slides were baked at 60°C for 2 h prior to the high throughput IHC procedure subsequently followed by being deparaffinized via sequential washing with xylene, graded ethanol, and water. Antigens were retrieved for 15 min at 95°C. Endogenous peroxidase was blocked with 3% H 2 O 2 for 30 min. Nonspecific staining was blocked using 10% normal goat serum (in 1× PBS) for 1 h at 37°C. The slides were incubated overnight at 4°C with CDH1 antibody (diluted in 1× PBS). The enhancing step, followed by incubation with the secondary antibody (1 h at room temperature) and the diaminobenzidine (DAB) substrate (5 min at room temperature), were performed according to the protocol of the ABC kit (DAKO).
Haematoxylin was used as a counterstain in the last step. The slides were then rinsed, cleared, and mounted. The staining was optimized based on negative and positive controls.

Clincopathological characteristics of HDGC gastric cancer patients
The family history and clinical characteristics of the 29 patients in this cohort were summarised in Table 3

Identification of potential pathogenic variant candidates and genes
In general, approximately 336296 non-synonymous variants (SNV and Indels) were detected from 29 GC patients. Germline variants with allele frequency less than 0.01 and not included in five public population databases (1000_AF, 1000_EAS, NHLBI-ESP_AA/EA, gnomAD_AF, gnomAD_EAS) were preserved for further potential pathogenic variant candidates investigation. The filtered variants that are identified as disease causing in HGMD or pathogenic/likely pathogenic in ClinVar were prioritised as potential deleterious variant candidates, and such group of variants was consisted of 17 missense and 2 splice donor variants located in 18 genes. Furthermore, association between gastric cancer susceptibility and the prioritised 18 candidate genes were manually reviewed based on GeneCards, NCBI database and literature review. The remaining unidentified variants were further analysed according to the pathogenic germline variant candidates filtering process indicated in materials and methods. Therefore, 3 SNV, 1 Indel and 2 splice site variants in 4 genes were selected as potential germline pathogenic variant candidates for GC. In total, 24 germline variant candidates in 22 genes observed in 20 GC patients (Table 4).

Potential deleterious CDH1 splice site variant
No known pathogenic variant in CDH1 was discovered in 29 GC patients. However, a splice site variant, c.2165-1G>A (NM_004360.4), was found in an EBV and HP negative mixed type early on-set (age of diagnosis: 31 years) GC patient who developed peritoneal and ovarian metastasis. This variant was reported as a pathogenic somatic mutation associated with breast carcinomas in somatic mutation database-COSMIC, suggesting its likelihood of pathogenicity. However, it is neither recorded in ClinVar nor any other population database. Furthermore, two splice site variants in the same position, c.2165-1 G>T and c.2165-1G>C, has been classified as likely pathogenic in ClinVar both by single submitters (27, 28).
Since this variant occurs in 5' untranslational region of CDH1, the normal splice function might be influenced by such alteration, affecting normal protein expression of E-cadherin.
Therefore, we performed immunohistochemistry to detect E-cadherin expression on the metastatic ovarian cancer tissues of this patient (#RB809). Compared with the positive control in GC tissues with normal E-cadherin expression, there are sporadic weak to moderate staining on cell membrane in patient carrying heterozygous germline c.2165-1G>A variant ( Figure 2).

Other potential deleterious variants identified in candidate genes associated with GC susceptibility
Another candidate variant identified in CDH1 was a heterozygous missense variants (c.2053G>A p.Val685Met) located in extracellular domain. This variant used to be identified in 1 of 188 healthy Hispanic individuals in a whole genome sequencing project (29), however, given that participants in this study were younger than 50 years, and that pathogenic variants in CDH1 only confers increased life-time cancer risk, the unaffected status of such individual may not be representative of control population. Moreover, this variant is absent in Chinese population in 1000 Gnomes Project. Nevertheless, despite the damaging effect predicted by both in silico tools (SIFT and Polyphen2), functional studies are required to further identify the potential impact on normal E-cadherin function.
A frameshift variant in CTNNA1 (c.1975_1976del, p.Glu659ArgfsTer42) was observed in a male diffuse type gastric cancer patient (#RB684) diagnosed on age of 28 years. This variant is neither recorded in ClinVar nor HGMD, and is not present in population databases. In addition, a study reported two CTNNA1 variants (nonsense and frameshift) in 2 unrelated HDGC families both showed loss of -catenin protein expression and preserved E-cadherin expression. It is known that -catenin is a critical component of an adaptor protein complex to bridge E-cadherin and filaments of cytoskeleton, and maintains cell stability as well as inhibiting cell motility (7). Therefore, the variant in CTNNA1 detected in this study may hold the potential to impair normal function of E-cadherin-catenin complex and contribute to diffuse type pathology.
APC is a colon cancer susceptibility gene whose pathogenic germline mutation may also confer carrier increased GC risk. A 26 year-old female diffuse gastric cancer patient (#RB651) with poorly differentiated phenotype carries a heterozygous splice doner variant in APC (c.-19+1G>A), which is not included in either ClinVar or HGMD. No population database recorded this variant either. However, her mother, a DGC patient (#RB797) diagnosed on age of 49 years does not carry such variant. Although this variant was predicted to disrupt normal splice by a functional predictive tool, its potential pathogenicity still needs to be confirmed by in vitro study.
Two missense variants in BRCA1 (c.5518G>T, c.3448C>T) discovered in another two earlyonset diffuse GC patients (#RB805, #RB665 respectively) in this cohort were filtered as pathogenic variant candidates. One of the highlighted variants, BRCA1 c.3448C>T, has been reviewed by ENIGMA expert panel and been classified as "benign" in ClinVar database, whereas the other variant was also recorded in ClinVar yet classified by single submitted as variant of uncertain significance (VUS) due to lack of functional study to verify its pathogenicity.

Potential pathogenic SNVs identified in genes with little evidence of GC predisposition
Among the prioritised 18 genes filtered by step 1 (Figure 1), only POLH has been reported to associate with a cancer-prone syndrome, which could increase gastric cancer risk (30), whereas five genes (GBA, SBDS, PRDM16, SPINK1, VWF) exhibited potential implication in gastric cancer. In addition, 6 genes (LIPH, GSS, SBF1, SLC22A5, TYR, HNF1A) were reported to be associated with several solid tumours. However, their implication in gastric cancer has yet to be elucidated. Moreover, there is little evidence showed association between cancer and the remaining 5 genes (ATP7B, MYH7, F8, KCNQ4, PTS). Therefore, interpretation of pathogenicity on the highlighted candidate variants in these 18 genes should be cautious. In vitro functional study or segregation analyses are required to further investigate their potential deleterious effect on gastric cancer predisposition.

POLH
POLH encodes polymerase that belongs to the Y-family of DNA polymerases, which functions to enable replication on damaged template DNA in an relatively error-free manner (31). Deleterious germline mutations on POLH may lead to a cancer-prone syndrome, xeroderma pigmentosum, variant type (XPV; MIM #278750) that follows an autosomal recessive inheritance manner (32). A germline missense variant in POLH gene was observed in a 35-year-old breast cancer female whose uncle developed breast cancer at age of 43 years (33). However, according to a case-control study on investigating association between candidate genes and breast cancer susceptibility, individuals carrying rare germline missense variants in POLH showed decreased risk of breast cancer (34). In addition, another casecontrol study reported a common germline variant in POLH (c.1783A>G) that was associated with risk of melanoma in Caucasian origin population (35).

GBA
GBA is a protein coding gene located in chromosome 1 that encodes a lysosomal member protein (36). Mutations in GBA that leads to aberrant function may cause Gauche's disease and Parkinson disease (37). Study hypothesised that GBAP1, acting as competing endogenous RNA, could promote GBA expression in GC through the association of miRNA-212-3p, and suggested that high expression of GBA is associated with poor GC prognosis (38). In addition, according to a GWAS study, a SNP (rs4460629) in GBA was speculated as causal variant for GC (39). However, to the best of our knowledge, GBA have not been reported as GC susceptibility gene.

SBDS
SBDS encodes a highly conserved protein that plays an important role in ribosome biogenesis. Biallelic mutations in SDBS gene have been mostly associated with Shwachman-Diamond syndrome (SDS, OMIM #260400), an autosomal recessive disorder with high risk of myelodysplastic syndrome (MDS) and acute myeloid leukaemia (AML) (40). However, Jason et al. speculated that germline mutations in SBDS gene may increase cancer predisposition (41). A case report introduced a female diagnosed as SDS three month after birth, and she developed poorly differentiated ductal breast carcinoma on age of 30 years. This patient carries two germline variants in SBDS gene, c.183_184TA>CT and c.258+2T>C, both inherited from her parents (42). Additionally, a nonsense germline variant in SBDS was detected in an early-onset gastric cancer patient with diffuse type adenocarcinoma (22). These evidences suggested the deleterious SBDS germline mutation may hold the potential to increase cancer susceptibility.

PRDM16
PRDM16 is a protein coding gene that resides in chromosome 1p36. PRDM16 plays critical roles to the maintenance of brown adipocytes, adult hematopoietic and neural stem cells. Moreover, PRDM16 has been reported abnormally expressed and rearranged in various cancers including GC (43)(44)(45)(46). Additionally, inhibition of PRDM16 by hypoxia-induced-miR-24 expression could promote GC cell proliferation and migration (47).

SPINK1
SPINK1 resides in chromosomal region 5q32, and encodes a 79 amino acid peptide that includes a 23 amino acid signal peptide. It is widely expressed in a variety of cancer malignancies, and associated with poor prognosis in majority of solid tumours (48). The serum levels of SPINK1 may help to distinguish high-risk groups of gastric cancer and discriminate advanced gastric cancer (49). Additionally, protein expression of SPINK1 in tumour tissue may have prognostic value on gastric cancer patients (50).

VWF
VWF encodes a protein called Von Willebrand factor, a large multimeric glycoprotein found in blood plasma, platelet α-granules, and subendothelial connective tissue (51). VWF plays important roles in hemostasis through mediating adhesion of platelets to subendothelial connective tissue, and by association with blood clotting factor VIII. Therefore, dysregulation of VWF could lead to hemostatic disorders or complications including including stroke, myocardial infarction and diabetes (52). Study showed that cancer cell-derived VWF promoted gastric cancer metastasis (53).

LIPH
LIPH is a member of the mammalian triglyceride lipase family. The expression of LIPH was found to be involved in papillary thyroid carcinoma development (54) and mRNA level of LIPH may act as a prognostic marker for esophageal adenocarcinoma, lung cancer and breast cancer (55)(56)(57). Study demonstrated that CIRH1A plays regulatory roles in proliferation and apoptosis of colorectal cancer cell (58).
GSS GSS, a protein coding gene located in chromosome 20, encodes glutathione synthetase, whose deficiency may lead to inborn error of glutathione metabolism with autosomal recessive inheritance pattern (59). GSS is important in glutathione synthesis in several tissues and under stressful conditions (60). Seven intronic polymorphic variants in GSS gene were reported to correlate with lung cancer survival (61). Additionally, GSS protein expression was increased in colonic tumour tissue compare with normal mucosa (62), whereas the protein level of GSS may assist the early detection of colorectal cancer (63). Moreover, SNPs in GSS gene expressed prognostic and predictive value in treatment of bladder cancer (64).

SBF1
SBF1 encodes a member of the protein-tyrosine phosphatase family. Mutations in this gene have been reported to be associated with Charcot-Marie-Tooth disease 4B3 (65). Few studies reported the SBF1 implication in cancer. An In vitro study indicated that SBF1 expression correlated with gemcitabine sensitivity in head and neck squamous cell carcinoma (66).

SLC22A5
SLC22A5 (also known as OCTN2), located within a cluster region on chromosome 5q, is a sodium dependent carnitine transporter and expressed in various human tissues (67). Genetic mutations in SLC22A5 that impairs normal expression of OCTN2 may cause systemic carnitine deficiency (68). Case-control study demonstrated that G risks allele at SNP rs27437 in SLC22A5 gene was associated with colorectal cancer risk (69). Study on GIST patients suggested that investigation on minor allele of SLC22A5 may assist to optimise imatinib therapy (70,71).

TYR
TYR is a protein coding gene located in chromosome 11q14.3. It encodes tyrosinase, a membrane glycoprotein widely expressed in mammalian tissues and essential in melanin production (72). Mutations in TYR that result in abnormal expression or dysfunction of tyrosinase often lead to oculocutanious albinism with different levels of severity in different ethnic groups (73). Studies on identification of potential cancer susceptibility variants suggested that a minor allele variant p.R402Q on TYR gene associates with increased risk of cutaneous cancer (74)(75)(76), whereas heterozygous polymorphism of TYR gene in codon 192 correlated with decreased risk of prostate cancer metastases (77) as well as associated with elevated squamous cell skin carcinoma risk in Caucasian population (78).

HNF1A
HNF1A, a protein coding gene, was initially discovered in the liver and was subsequently identified to be widely expressed in several tissues including the pancreas, kidney, intestine and stomach (79). Mutations in HNF1A are a common cause of maturity-onset diabetes of the young (MODY) (80). Study on investigating colorectal cancer predisposing gene through panel sequencing found a germline variant of HNF1A c.92G>A in colorectal patient who met Amsterdam II criteria (81). Moreover, another germline variant in HNF1A c.1018C>G was detected in early-onset female lung cancer patient (82). In addition, a germline deletion spanning exon2 to 3 was observed in a family with MODY type 3 and primary liver cell neoplasia (83). A study on families with familial liver adenomatosis also found heterozygous truncating germline variants in HNF1A (84). 1. 2 GC cases regardless of age, at least one confirmed DGC 2. One case of DGC < 40 3. Personal or family history of DGC and lobular breast cancer, one diagnosis < 50 SDR: second-degree relative; TDR: third-degree relative, FDR: first-degree relative; "~" indicates approximate age of diagnosis; "NA" in Differentiation column and "x" in TNM stage column represent information not available.

Discussion
HDGC was firstly described in 1964, and it is one of the familial forms of gastric carcinoma, characterised by multiple microscopic foci of intramucosal signet-ring cell and was considered as a highly invasive tumour with poor prognosis (85,86). It was confirmed by linkage and mutational analysis that germline mutations in CDH1 play an important role in multi-generational incidence of such deleterious malignancy (87). Subsequently, approximately a hundred other HDGC families were reported one after another in the next decades (85). Moreover, studies suggested that the cumulative lifetime risks of developing gastric cancer by 80 years for male and female individuals carrying pathogenic CDH1 mutations are 40% to 67% and 63% to 83% respectively. In addition, female carriers may have 39% to 52% risk of developing breast cancer, predominantly characterised as lobular type (8,11,88). Considering the high penetrance of pathogenic CDH1 mutation carriers, with the purpose of identification of this dominantly inherited familial cancer syndrome, a consensus statement of clinical diagnosis and management of HDGC was formed to improve clinical practice. Therefore, three major characteristics, early-onset, diffuse type classification and family clustering, have been widely accepted to be clinical diagnostic criteria for HDGC and screening conditions for pathogenic germline variants in CDH1 (9). Such consensus was later revised and updated to recommend patients with DGC diagnosed before 40 years or two GC in first and second degree relatives with one confirmed DGC before age of 50-year or personal / family history of DGC and lobular breast cancer with one diagnosed before 50 years to be considered for clinical diagnosis of HDGC and further evaluated by genetic testing to identify CDH1 genetic status (4,10). However, a study on exploiting pathogenic germline CDH1 mutation in Korean HDGC patients demonstrated that the detection rate of CDH1 carrier is only 4.3%, which is no more than half of the detection rate (10%-18%) in low prevalence gastric cancer countries (4,89). This evidence suggested that the previously established CDH1 germline mutation screening criteria established based on low prevalence of gastric cancer population may not be appropriate to identify CDH1 pathogenic variants in HDGC patients in East Asian countries with high incidence of gastric cancer.
In current study, with the aim of investigating novel susceptibility genes and potential pathogenic germline variants that may contribute to occurrence and development of earlyonset diffuse-type gastric cancer, 28 GC patients in this cohort had confirmed diagnosis of GC with diffuse-type before 40-year, whereas one patient, diagnosed as DGC at 49 years, is a first degree relative to one of the 28 GC patients. All 29 GC patients fulfilled the 2015 HDGC clinical diagnosis criteria. In this study, 25 pathogenic variant candidates in 22 genes were highlighted by our filtering process. Since diffuse-type GC may attribute to genetic abnormalities in cell adhesion function in relation to E-cadherin and WNT signalling (36), four variants discovered in this study, CDH1 c.2165-1G>A (NM_004360.4), CDH1 c.2053G>A (NM_004360.4), CTNNA1 c.1975_1976del (NM_001903.2) APC c.-19+1G>A (NM_001127511.2), were suspicious to be gastric cancer predisposing genetic factors. Although previously reported CDH1 pathogenic mutations were not observed in this study, a novel splice site variant c.2165-1G>A in CDH1 with likelihood of pathogenicity was discovered in one patient, suggesting a low detection rate of potential CDH1 pathogenic variants, 3.4% (1/29), in Chinese HDGC patients. This result is consistent with the above study mentioned Korean population study (89), further indicating that genetic testing for germline mutation in CDH1 gene alone may not be enough to identify genetic and molecular basis of HDGC in high incidence gastric cancer area, especially in China.
In addition, convincing pathogenic mutations in predisposing genes associated with increased gastric cancer were neither discovered. Therefore, one potential explanation could be susceptibility genes that increase gastric cancer risks in Chinese population may differ from that of in Caucasion population. In addition, many cancer-predisposing gene candidates other than CDH1, such as CTNNA1, INSR, FBXO24, DOT1L, BRCA1, PALB2, RAD51C, MSH2, ATR, NBN and RECQL5 were previously discovered and reported in families with HDGC with various ethnic background (11,13,15,25,90). However, the pathogenicity of such variant candidates require for further verification through functional studies and cosegregation analysis on family members. In addition, families without CDH1 germline pathogenic mutation who met HDGC clinical diagnosis criteria may be benefit from cancerpredisposing-gene panel screening. Therefore, considering the low detection rate of CDH1 germline mutation in early-onset DGC patients, genetic screening for Chinese gastric cancer patients who meet the HDGC clinical diagnosis criteria using a multi-gene panel including other gastric cancer associated predisposing-gene are likely to be more cost effective.
Nevertheless, the information of structural aberration and large deletion and insertion were not able to be precisely acquired by whole exome sequencing analysis in this study, therefore, using a combination of multiplex ligation-dependent probe amplification (MLPA) and RT-PCR analysis may potentially help identify more genetic alteration in Chinese HDGC patients, since such methodology has ever discovered a large genomic deletion (c.164-?_387+?del) in a Japanese familial gastric cancer family, which results in the loss of exon 3 in CDH1 and a E-cadherin truncating product (91). Such hypothesis requires for large number cohort study verification containing consecutive Chinese gastric cancer patients.
Furthermore, the inconclusive results in this study warrant future investigation on gastric cancer susceptibility gene discovery that cohort selection may require more stringent conditions rather than early-onset diffuse-type GC patients.

Conclusions
This study suggested that CDH1 c.2165-1G>A may act as a gastric cancer predisposing variant. In addition, to further investigate molecular mechanisms of early-onset gastric cancer, one may consider 22 genes observed in this study. Furthermore, the inconclusive results in this study warrant future investigation on gastric cancer susceptibility gene discovery that cohort selection may require more stringent conditions rather than earlyonset diffuse-type GC patients.

Ethics approval and consent to participate
A written informed consent to participate in the study was obtained from each subject in accordance with the declaration of Helsinki principles. Each patient or the patient's family was fully informed of the investigational nature of this study and provided their written, informed consent. The study protocol was approved by the Medical Ethnic Committee, Beijing Cancer Hospital (approve number #2018KT64).
YF participated in the data analysation of the study and drafted and finalised the manuscript, MZ helped with study data collection and analysation and manuscript preparation, WL, SL, CQ, MM, MY, WC, helped with study sample and data collection, XX participated in the design of the study and manuscript editing, SJ conceived the study, and participated in its design and coordination and helped to draft and edit the manuscript. All authors have read and approved the final manuscript.