Autoantibody Signatures Discovered by HuProt Protein Microarray to Enhance the Diagnosis of Lung Cancer

Background: This study aims to comprehensively discover novel autoantibodies (TAAbs) against tumor-associated antigens (TAAs) and establish diagnostic models for assisting in the diagnosis of lung cancer (LC) and discrimination of pulmonary nodules (PN). Methods: HuProt human microarray was used to discover the candidate TAAs and Enzyme-linked immunosorbent assay (ELISA) was performed to detect the level of TAAbs in 634 participants of two independent validation cohorts. Logistic regression analysis was used to construct models. Receiver operating characteristic curve (ROC) analysis was utilized to assess the diagnostic value of models. Results: Eleven TAAs were discovered by means of protein microarray and data analysis. The level of ten TAAbs (anti-SARS, anti-ZPR1, anti-FAM131A, anti-GGA3, anti-PRKCZ, anti-HDAC1, anti-GOLPH3, anti-NSG1, anti-CD84 and anti-EEA1) was higher in LC patients than that in NC of validation cohort 1 (P<0.05). The model 1 comprising 4 TAAbs (anti-ZPR1, anti-PRKCZ, anti-NSG1 and anti-CD84) and CEA reached an AUC of 0.813 (95%CI: 0.762-0.864) for diagnosing LC from normal individuals. 5 of 10 TAAbs (anti-SARS, anti-GOLPH3, anti-NSG1, anti-CD84 and anti-EEA1) existed a signicant difference between malignant pulmonary nodules (MPN) and benign pulmonary nodules (BPN) patients in validation cohort 2 (P<0.05). Model 2 consisting of anti-EEA1, traditional biomarkers (CEA, CYFRA211 and CA125) and 3 CT characteristics (vascular notch sign, lobulation sign, mediastinal lymph node enlargement) could distinguish MPN from BPN patients with an AUC of 0.845 (sensitivity: 58.3%, specicity: 96.6%). Conclusions: High-throughput protein microarray is an ecient approach to discovering novel TAAbs which could increase the accuracy of lung cancer diagnosis in the clinic. Se: sensitivity, Sp: specificity, AUC: area under curve, 95%CI: 95% confidence interval, +LR: positive likelihood ratio, -LR: negative likelihood ratio, PPV: positive predictive value, NPV: negative predictive value, YI: Youden’s index, LM(-): negative lymph node metastasis, LM(+): positive lymph node metastasis, DM(+): positive distant metastasis, DM(-): negative distant metastasis, MPN: malignant pulmonary nodule, BPN: benign pulmonary nodule.


Introduction
More than 2.2 million new lung cancer (LC) cases and 1.8 million deaths were estimated to occur in 2021, which accounts for 12.4% of total new cancer cases and 21.6% of total new death cases, respectively(1).
The 5-year survival rate for metastatic disease is 6% while it could be up to 57% for localized LC (2). Lowdose computed tomography (LDCT), as an effective LC screening approach, could signi cantly reduce the mortality of LC compared to X-ray examination (3,4). Owing to heavy economic burden and high false positive ratio, its application as a routine diagnostic method for LC was not unrealistic (5,6). Given the mental and physical pressure on patients and their family, numerous studies are mainly committed to look for a kind of biomarker with lower cost and excellent diagnostic utility which could discriminate LC patients at an earlier stage.
Tumor-associated antigens (TAAs) refer to the aberrantly expressed or mutated proteins which stimulate humoral immune response and the corresponding antibody generated by immune system was known as autoantibodies against TAAs (TAAbs). Because of its stability and speci city in serum, TAAbs emerged as promising biomarkers was extensively investigated.
To date, a great deal of technologies was commonly applied in the identi cation of novel TAAs, such as serological analysis of recombination cDNA expression libraries (SEREX), serological proteome analysis (SERPA) and protein microarray. Although SEREX has made huge effects on the TAA identi cation, its application was limited. Since only highly abundant TAAs with linear epitopes can be identi ed, other epitopes might be omitted. SERPA, combined 2-dimensional gel electrophoresis (2D-GE) with mass spectrometry (MS), has been used to explore TAAs with post-translational modi cations and overexpression in the long term. Since 2D-GE was utilized, SERPA was restricted by 2D-GE technology, which can be challenging to obtain reproducible 2D gels (7,8). Protein microarray, focused on correlations among proteins, peptides on a large scale, accelerated dramatically high-throughput identi cation of TAAs for the development of tumor biomarkers (7,9). This approach was widely applied to the identi cation of autoantibody signatures in a number of cancers (10)(11)(12)(13)(14). Our previous study illustrated that a panel of seven TAAbs discovered via focused protein array encoded by 154 cancer driver genes manifested prominent diagnostic capacity with the area under the receiver operating characteristic curve (AUC) of 0.897, the sensitivity of 94.4% and the speci city of 84.9% for the detection of LC (15).
The HuProt human microarray (CDI Laboratories) was originally developed by the Zhu laboratory at Johns Hopkins University and contained 16,368 unique full-length human proteins, representing 12,586 proteincoding genes(16). Pan and his colleagues employed HuProt v3.0 array consisted of 20,240 human fulllength proteins to discover expressed differently biomarkers between early LC patients and normal control (NC) (14). In our research, HuProt v3.1 protein microarray based on 21,216 human proteome proteins was applied to screen TAAs with sera from LC patients and normal individuals. After veri cation of TAAbs by ELISA in different sets of samples, the diagnostic value in LC of TAAbs were evaluated, followed by combining traditional biomarkers and CT parameters to construct models for distinguishing malignant pulmonary nodules (MPN) and benign pulmonary nodules (BPN).

Study population and serum collection
This study consisted of 654 serum samples were derived from the Biological Specimen Bank of Henan Key Laboratory of Tumor Epidemiology. The characteristics of all participants were described in Table 1. Three independent cohorts (discovery cohort, validation cohort 1 and 2) were used in this study. In discovery cohort, 10 LC serum and 10 NC serum matched by age and gender were detected through protein microarray. Serum samples from 212 LC patients and 212 matched NC, 105 MPN patients and 105 BPN patients were involved in validation cohort 1 and 2, respectively.
All LC patients and 105 BPN patients were recruited from November 2016 to November 2019 at the time of initial diagnosis with histopathology, while all patients had not received treatment with any chemotherapy or radiation therapy. Normal individuals were recruited from the physical examination population from May 2019 to June 2019. Five milliliter peripheral blood sample was drawn and separated by centrifugation at 3000 rpm for 5 min and then stored at -80 ℃ for further experiments.
CEA, CYFRA211 and CA125 concentration in serum was detected by electro-chemiluminescence immunoassay (Roche, USA). The 12 characteristic data of CT (number, diameter, edge, spiculation, vascular notch sign, lobulation, spines, pleural indentation, mediastinal lymph node enlargement, emphysema and calci cation) were judged by two professional radiologists. The study protocol was approved by the Ethics Committee at Zhengzhou University and all the participants have signed informed consent.

Huprot protein microarray and gene enrichment analysis
HuProt TM v3.1 protein microarray was purchased from BCBIO technology (Guangzhou, China). Protein microarray was used to screen the candidate autoantibody for the diagnosis of LC. Initially, protein microarray was blocked by 3% BSA at room temperature for 1 h before incubation. Subsequently, the blocked microarray was incubated with 1:200 dilution of serum sample as primary antibody at 4℃ overnight. After washing with PBST, the microarray was incubated with 1:1000 dilution of secondary antibody at room temperature for 1h in the dark. After washing with PBST and ddH 2 O, the microarray was dried and then scanned with LuxScan 10K-A (CapitalBio).
The medians of foreground (F532 Median) and background (B532 Median) intensity of each protein at 532 nm were observed by scanning instrument. The ratio of F532 Median to B532 Median was F/B de ned as signal-to-noise ratio (SNR) for the normalization of microarray. The normalization among microarrays was conducted by z-score. The positive ratio of each TAA in LC or NC refers to the ratio of the number whose SNR are higher than 4 (cutoff) to the total in LC or NC group. The analysis method for screening candidate protein was as follow: The Gene Ontology (GO) term and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, based on Metascape software (https://metascape.org/), were applied for the exploration of pathways in which candidate genes of microarray signi cantly enriched. Adjusted P < 0.01 was considered as signi cantly enriched.

Statistical analysis
GenePix Pro 6.0 was applied to obtain the original data from Huprot protein microarray. All data from ELISA was analyzed or visualized by SPSS 25.0, and GraphPad Prism 8.0 software. Non-parametric test was used to compare the levels of TAAbs between LC patients and NC, MPN and BPN patients. The χ 2 test was performed to compare the frequency differences in characteristics of each cohort. Logistic regression analysis was used in the construction of models. ROC analysis was performed to evaluate the capability of each autoantibody and models in the diagnosis of LC and discrimination of PN. The OD value of each autoantibody that shows the highest sensitivity with more than 90% speci city was de ned as the cutoff. The cutoff of model was the predicted probability with maximum Youden's Index (YI). The two-tailed P value < 0.05 was statistically signi cant.

Study design
The design of the present study was illustrated in Fig.1. In the discovery phase, Huprot protein microarray was tested in 10 LC and 10 NC samples and 11 TAAbs were identi ed as selected TAAbs according to the selected criteria. ELISA was then applied to examine the level of selected TAAbs in samples from 212 LC and 212 NC in validation cohort 1 of validation phase, where 10 TAAbs were validated and a diagnostic model with 4 TAAbs and CEA was established. In the following validation cohort 2, 5 out of 10 TAAbs were further validated in 105 MPN and 105 BPN patients, and a model with TAAb and clinical characteristics was constructed for distinguishing malignant and benign PNs.
3.2 Candidate TAAs based on Huprot protein microarray 182 candidate TAAs were screened via protein microarray. The GO and KEGG analysis of 182 geneswas showed in Figure S1. We found thatthese genes were closely associated with terms in biological pathways through GO analysis, such as immune system process, biological adhesion, positive regulation of biological process and so on(Figure S1a). Besides, these genes revealed enrichment in KEGG analysis mainly relative to natural killer cell mediated cytotoxicity, glycosphingolipid biosynthesis, measles, cell adhesion molecules and so forth ( Figure S1b Table S1. Figure S2 Table 3). The level of each TAAbs in different group was showed in Table S2. There were no signi cant difference among the expression of each TAAbs in LC patients with different characteristics (P<0.05). and anti-EEA1) was signi cantly higher in MPN patients than that in BPN patients (P<0.05, Fig.5a). The AUC (95%CI) of 5 TAAbs was from 0.580 (95%CI: 0.503-0.657) to 0.630 (95%CI: 0.555-0.705) and the sensitivity was from 12.4% to 21.9% with a speci city over 90% (Fig.5b-5f, Table 3).

Establishment and evaluation of the model in discriminating MPN from BPN
In validation cohort 2, 5 signi cant TAAbs (anti-SARS, anti-GOLPH3, anti-NSG1, anti-CD84, anti-EEA1), 3 traditional biomarkers (CEA, CYFRA211 and CA125) and 9 nodular characteristics of CT (number, cavity, spicule sign, vascular notch sign, lobulation sign, spines, pleural indentation, mediastinal lymph node enlargement, calci cation) were applied to establish model used for distinguishing MPN from BPN patients through logistic regression analysis. One hundred eighteen samples (60 MPN samples and 58 BPN samples) with the result of traditional biomarkers and CT were selected for the further research. The discriminating performance of the model 2 for MPN patients with different characteristics was described in Fig.6 and Table 5. The model could discriminate MPN patients in each subgroup from BPN patients (Fig 6a). Furthermore, the AUC of model 2 in all subgroups ranged from 0.683 to 0.983 (Fig.6b-6j,  Table5). The sensitivity of each subgroup was from 26.9% to 94.4% with the same speci city of 96.6% (Table 5). To ensure same speci city in all subgroups (cutoff: 0.392), the sensitivity varied from 47.8% to 73.6%. Moreover, the accuracy of each subgroup was from 78.1% to 82.6% (Table 5) (Fig.6c-6h, Table 5). The AUC of patients with more than or equal to 3cm in diameter (AUC: 0.938, 95%CI: 0.862-1) was higher than patients with less than 3 cm in diameter (AUC: 0.808, 95%CI: 0.722-0.894) (Fig.6i-6j, Table 5).

Discussions
Recently, an increasing number of microarray technology was widely used in a broad range of applications, including biomarker discovery, pro ling of immune responses, identi cation of enzyme substrates, and quantifying protein-small molecule, protein-protein and protein-DNA/RNA interactions(16). A great deal of studies on the basis of protein microarray has con rmed TAAbs in serum that can screen high-risk patients from normal individuals (14,15). In our research, 11 candidate autoantibodies to TAAs (SARS, ZPR1, FAM131A, GGA3, PRKCZ, HDAC1, GOLPH3, NSG1, CD84, DAB1 and EEA1) were screened out for the diagnosis of LC by Huprot protein microarray in 20 serum samples. Subsequently, 10 TAAbs ( anti-DAB1 was excluded) validated by ELISA were considered as promising biomarkers for the detection of LC and these TAAbs could diagnose LC from healthy individuals with a sensitivity range from 8.96 % to 27.4 % with speci city over 90%. In addition, 5 of 10 TAAbs were proved to distinguish MPN from BPN patients.
These results proved that protein microarray is an e cient way to screen novel TAAbs for the detection of LC and MPN patients.
Seryl-tRNA synthetases (SARS) are ubiquitously expressed enzymes that play an essential role in the maintenance of the genetic code by coupling proteinogenic amino acids to their cognate tRNA(s) (21).
ZPR1 is a zinc nger protein that binds to the cytoplasmic tyrosine kinase domain of the epidermal growth factor receptor (EGFR) (22) . Furthermore, it has essential roles in embryonic development and cell apoptosis. ZPR1 can translocate to the nucleus during the S phase to regulate the cell cycle progression (23,24) and cell proliferation (25)(26)(27). ZPR1 promotes breast cancer cells invasion and migration via ERK/GSK3β/snail signaling (27). PRKCZ is a member of the PKC family that serves important roles in cell growth, metabolism and other associated signal transduction pathways (28). Positive PRKCZ expression in ADC was associated with cell invasion, lymph node metastasis, and the expression of MMP-2 and MMP-9 (29). NSG1, also designated D4S234E or NEEP21, is an endosomal protein expressed in neuronal cells under normal conditions and a direct transcriptional target gene of the tumor suppressor p53 (30,31).
CD84, also named SLAM5, is the member of the signaling lymphocyte activation molecule (SLAM) family which contains homophilic and heterophilic receptors that modulate both adaptive and innate immune responses (32,33). Treatment of human T cells with CD84-Ig enhances TCR-induced IFN-γ secretion presumably through homophilic engagement of cell surface CD84 (34). Early Endosome Antigen 1 (EEA1) is a protein responsible for vesicle budding, transporting, tethering, and docking events in early endosomes, and has also been demonstrated to be involved in PIK3 pathway (35,36). HDAC1is tightly controlled by a balance between the opposing activities of histone acetyltransferases and histone deacetylases (37). Overexpression of HDAC was found in several cancers (37)(38)(39)and thus in uences their development and treatment. HDAC1 knockdown inhibits invasion and induces apoptosis in non-small cell lung cancer cells (40).
Among the above TAAbs, only anti-NSG1 was proved to be highly expressed in LC (14)  Numerous studies have proved that combination of markers from different levels could improve signi cantly the accuracy of diagnostic test (4,18,19). Logistic regression is a common method used for the construction of diagnostic model and the combination of multiple level markers in various cancers (10,17,18). In our previous study, we employed logistic regression to combine CEA and anti-EGFR for gaining a higher diagnostic capability (17). The AUC of CEA and anti-EGFR is 0.681 and 0.703, respectively, while the combination of CEA and anti-EGFR reached an AUC of 0.784 (17). Similarly, logistic regression was utilized to improve the diagnostic capability of 4 TAAb panel (anti-ZPR1, anti-PRKCZ, anti-NSG1 and anti-CD84) with AUC of 0.747 in validation cohort 1 in our study. When combined with CEA, the AUC of combination is up to 0.813 (95%CI: 0.762-0.864). Moreover, the model with four TAAbs and CEA could distinguish LC patients at early stage from NC with the AUC of 0.695.
CT, an e cient radiological technology, was proved to reduce lung cancer-related mortality effectively (3). Increasing models based on CT indicators and serum biomarkers with excellent diagnostic performance were used for the differentiation of PN (4,42,43). Hence, we established a model with an Several improvements have been made in the current study. Firstly, the protein microarray used for discovery cohort is one of the world's largest collections of full-length human proteins, covering 81% of the human proteome. Subsequently, in contrast to Jiang's (15) and Pan's research (14), we not only tested the level of candidate TAAbs in NC and LC patients, but also validated their expression in MPN and BPN patients of validation phase. Last but not least, we combined TAAbs with traditional biomarkers and CT indicators in order to establish economical models applied in the diagnosis of LC and the discrimination of PN for alleviating economic burden of patients. These models manifest prospective diagnostic capability in LC and MPN patients with different clinical features.
The limitation of our study is that these models are not validated in a larger sample size cohort and failed to validate their stability on another independent groups.
It is also regretful that we fail to evaluate our model's differential ability in comparison with other models published previously by other researchers.
To sum up, we applied Huprot protein microarray to discovery promising TAAbs for the detection of LC and the discrimination of PN. The established models which own excellent ability in screening LC and MPN patients are expected to perform in clinical applications.

Declarations
Ethics approval and Consent to participate and publication The study protocol was approved by the Ethics Committee at Zhengzhou University. Written informed consent to participate in this study and for publication was provided by all the participants of the study.

Availability of data and materials
On a reasonable request, the data supporting study's ndings are available from the corresponding author.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementary.doc