Identification of pedigrees with high risk of familial LC
To reduce possible bias, we adjusted both the case arm and control arm for sex, age, lung disease history, smoking index, living environment, and occupational exposure.
According to the number of affected individuals among the first-degree relatives of the probands and spouses, the pedigrees were divided into three groups: 0, 1, and 2 or more affected individuals (Table 1). As shown in the table, except for one subgroup with a small sample size in the control arm, the remaining groups showed statistically significant differences. Therefore, we found that the subgroup with a family history of at least two first-degree relatives affected by LC was at highest risk.
Table 1 Odds ratios for risk of lung cancer among first-degree relatives
Factors
|
Case/Control
|
Crude OR (95%CI)
|
Adjusted OR* (95%CI)
|
P-value
|
Family history of any cancer
|
|
|
|
|
No
|
432/438
|
1.00
|
1.00
|
|
Yes
|
201/127
|
1.60(1.24,2.08)
|
1.71(1.28,2.28)
|
<0.001
|
Family history of lung cancer
|
|
|
|
|
No
|
560/534
|
1.00
|
1.00
|
|
Yes
|
73/31
|
2.25(1.45,3.47)
|
2.20(1.36,3.55)
|
<0.001
|
N of pedigrees with
|
|
|
|
|
0
|
432//438
|
1.00
|
1.00
|
|
1
|
149/111
|
1.36(1.03,1.80)
|
1.55(1.14,2.12)
|
0.002
|
≥2 any cancers
|
52/16
|
3.30(1.85,5.86)
|
2.65(1.42,4.94)
|
0.001
|
N of pedigrees with
|
|
|
|
|
0
|
560/534
|
1.00
|
1.00
|
|
1
|
65/30
|
2.07(1.32, 3.24)
|
2.11(1.29, 3.44)
|
0.001
|
≥2 lung cancers
|
8/1
|
7.63(0.95, 61.20)
|
4.49(0.51, 39.27)
|
0.029
|
* Adjusted for sex, smoking index, lung disease history, living environment, and occupational exposure.
In Table 2, while comparing patients of squamous carcinoma with small cell LC, family history of disease was not significantly different. However, while comparing patients of adenocarcinoma with squamous carcinoma, a family history of disease in first-degree relatives significantly increased the risk of lung adenocarcinoma (OR = 2.74, P = 0.018).
Table 2 Risk of family history on lung cancer stratified by histologic characteristics
Histologic characteristics
|
Family history of lung cancer
|
Adjusted OR*
(95%CI)
|
No
N(%)
|
Yes
N(%)
|
Squamous carcinoma
|
111(94.1)
|
7(5.9)
|
1.00
|
Small cell carcinoma
|
56(94.9)
|
3(5.1)
|
0.90
(0.22, 3.63)
|
Adenocarcinoma
|
427(85.7)
|
71(14.3)
|
2.74
(1.19, 6.31)
|
* Adjusted for sex, smoking index, lung disease history, living environment, and occupational exposure.
Therefore, we identified pedigrees whose probands had adenocarcinoma and had no less than two first-degree relatives with LC as having a highest genetic risk. The affected individuals were biologically related (Supplementary Table S2).
We included five probands as learning sets who were from familial LC pedigrees determined by epidemiological analysis (Fig. 1, red arrows). We also included three healthy individuals without a family history of any cancer as controls.
Shared somatic mutations and germline SNPs in the probands may not associate with familial lung cancer
We performed WES of both cancer tissues and para-cancer tissues from the five probands. Each sample (cancer and para-cancer) yielded more than 100 million 100-nt reads from the sequencer. 82~85% of the reads were mapped to the human reference genome, indicating a good quality of the entire sequencing experiment (Supplementary Table S3). The exome capture kit which captures 96Mb exon and UTR regions; therefore, the nominal average depth of the captured regions reached more than 91x (Supplementary Table S3), providing a good basis for SNV and SNP calling. We identified 727–1033 nonsynonymous somatic mutations (Supplementary Table S4), but none was shared in all five probands, suggesting that shared somatic mutations were not the cause of the familial high incidence of LC. No known driver mutations were found in the five probands, except a KRAS G12V in proband 5. These findings indicated that driver mutations may not explain the high incidence of LC.
We next identified 281 shared germline SNPs among all probands (Supplementary Table S.5). However, few PPIs were found among these 281 genes according to STRING-DB; only three subgraphs showed more than five nodes (Fig. S1). No significant enrichment of interactions was found against the genetic background (P = 0.102), demonstrating that this network was a random sample from the genetic background. Gene ontology enrichment analysis by PANTHER showed no enrichments on “Biological Process” and “Molecular Functions” (P > 0.05). KEGG pathway analysis showed no significant enrichment in any pathway (P > 0.05), either. These results suggested that these shared germline SNPs were unlikely to be functionally relevant to LC.
Individual germline SNPs and PPI network patterns showed significant association with familial lung cancer
We next performed PPI analyses for genes containing germline SNPs in each proband and healthy control. Most of the genes containing germline SNPs in each of the five probands formed a large and interconnected PPI network main graph (Fig. 2A), whereas those from healthy controls formed much smaller PPI network graphs (Fig. 2B, C). These results demonstrated that germline SNP-containing genes in the probands tended to interact with each other, expanding the impact of SNPs throughout the system and indicating the robustness of the effect.
In addition to many more nodes in the main graph, the proband main graphs also had a much shorter path length than those of the healthy controls, except for healthy control #2 due to the very small main graph for this individual (Fig. 2D, E). Additionally, the proband main graph possessed a significantly higher number of neighbors than that of the healthy controls (P = 0.0145, two-tailed Kolmogorov-Smirnov test, Fig. 2E). These results demonstrated that the information on the proband main graphs could be rapidly transmitted to the entire network. Moreover, the degree distribution of the five probands did not strictly follow the power law, with the number of medium-degree nodes markedly higher than that expected by power law (Fig. 2F), indicating that these main graphs were more densely interconnected than a standard biological PPI network (described as scale-free network that obeys the power law).
Validation of individual germline SNPs and PPI network patterns in other five familial lung cancer patients
If this hypothesis was true, we could deduce that other members in the familial LC families, especially newly diagnosed patients with LC, should share similar features of germline SNPs due to similar genetic backgrounds. The germline SNPs of other five similar probands, four healthy individuals in the former familial families and three patients with sporadic LC were used as a validation set. Similar to the five probands, the latter five familial lung cancer patients generally had many interconnected SNP-containing genes as a large main graph, and the main graph contained more than 60% of the SNP-containing genes (Fig. 2G). This significantly distinguished these individuals from healthy controls (P = 0.0485, two-tailed Mann-Whitney U-test). We also tested three patients with sporadic lung adenocarcinoma. These patients had significantly fewer nodes in the main graph than individuals in the familial LC families (P = 0.018, two-tailed Mann-Whitney U-test), but were similar to the healthy individuals in the familial LC families (P = 0.70, two-tailed Mann-Whitney U-test).
SNP-containing genes in PI3K/AKT pathway
The highly interconnected SNP-containing genes in familial LC families suggested that these genes may function together in a more effective way by interfering with entire pathways and thus potentially elevating the risk of cancer incidence. As a verification, the five probands shared only two shared KEGG pathways in the top 10 pathways: “Pathways in Cancer” and the “PI3K/AKT Pathway” (Supplementary Table S.6A). Similarly, both pathways appeared in the top 10 pathways in the five newly diagnosed patients with LC from other familial families. In sharp contrast, the PI3K/ATK pathway did not appear in the top 10 pathways in three of four healthy individuals in familial families, potentially explaining why these individuals had not yet been diagnosed with LC at the time of participation in the study. This scenario was similar to that for the three healthy controls with no cancer incidence in their families for three generations; only one person had the abovementioned two pathways enriched in the top 10 KEGG pathways. We also analyzed the germline SNPs of three patients with sporadic LC. Interestingly, “Pathways in Cancer” existed in all three patients, whereas the PI3K/AKT pathway was identified in two patients.
In addition, nonsynonymous somatic mutations in the five familial family probands and the three patients with sporadic LC shared the same trends in enriched pathways; that is, the “Pathways in Cancer” or “PI3K/AKT Pathway” appeared in the top 10 KEGG pathways (Supplementary Table S.6B). This indicated that somatic mutations in these pathways further reinforced the alterations in these pathways needed to drive the entire system into a cancerous state.
Number of SNP-containing genes in the PI3K/AKT pathway
The numbers of SNP-containing genes in the “Pathways in Cancer” and “PI3K/AKT pathway” were positively correlated (Fig. 3A). The data points were automatically clustered into two groups using the unsupervised hierarchical clustering method way: all five probands and the five newly diagnosed patients with familial cancer had more than 15 SNP-containing genes in the PI3K/AKT pathway and more than 10 genes in “Pathways in Cancer”. In contrast, most healthy individuals (including all three healthy controls and three healthy individuals in familial families) and all three patients with sporadic LC had fewer SNP-containing genes in these two pathways. Thus, the number of germline variation-containing genes of the PI3K/AKT pathway (> 15 genes) may be an important predictor of the high risk of LC. The optimal division line is indicated in Figure 3 and was solved by a simple SVM classification method.
The functions of these SNPs have not been investigated thoroughly. Nevertheless, we subjected the SNPs of the 5 probands in the PI3K/AKT pathway to functional predictions and database search. In the ClinVar database, 6~10 SNPs were recorded as non-“benign”, e.g. pathogenic, Conflicting interpretations of pathogenicity, risk factor, etc. (Fig. 3B), which indicated that these SNPs are potential risk factors of diseases (most of which are tumors). In the COSMIC database, nearly half of these SNPs have been found as somatic mutations in cancer (Fig. 3C), indicating that these mutations might be helpful for the cancerous malignancy. We also predicted the functions of these SNPs using SIFT & PROVEAN tool {Choi, 2015 #62}. 28.1~51.2% of the SNPs were predicted as “damaging” by PROVEAN, which means that these SNPs would alter the protein structures and thus may lead to significant functional changes. These results suggested that the PI3K/AKT SNPs of these familial LC patients may contribute to systemic and functional risk.
One individual had many germline variations in the PI3K/AKT pathway
Notably, one healthy individuals in a familial family (marked with arrows in Figure 2G, Figure 3 and Supplementary Table S.6) exhibited features identical to those of patients with familial LC, including a large and interconnected main graph of the germline SNP-containing genes (Fig. 2G), “Pathways in Cancer” and the PI3K/AKT pathway as the top two KEGG pathways (Supplementary Table S.6), and 24 and 20 SNP-containing genes in the two pathways, respectively (Fig. 3). One year after her initial enrollment in this study, cancer lesions were detected in her lungs, and pathological adenocarcinoma was diagnosed. Although more cases are needed for reinforcement, this case indicated the feasibility of using such criteria to predict the incidence of familial LC.