Integration of Transcriptomic and Proteomic Proles Identies Potential Biomarkers in Idiopathic Pulmonary Fibrosis

Idiopathic pulmonary brosis (IPF) is an interstitial lung disease characterized by chronic progressive pulmonary brosis and a poor prognosis. Till now, no studies have been reported on revealing mechanisms and identifying biomarkers by integratively analyzing transcriptome and proteomes of IPF patients. Here we examined the landscape of IPF patients' gene expression in the transcription and translation phases and investigated the expression and functions of two new potential biomarkers. Differentially expressed (DE) mRNAs were mainly enriched in pathways of immune system activities and inammatory responses, while DE proteins are associated with extracellular matrix production and wound repair. The upregulated genes in both phases are associated with wound repair and cell differentiation, while the downregulated genes in both phases are associated with reduced immune activities and the damage of the alveolar tissues. On this basis, we identied thirteen potential marker genes. Among them, we validated the expression changes of BTNL9 and PLLP and investigated their functional pathways in the IPF mechanism. Both genes are downregulated in the tissues of IPF patients and Bleomycin-induced mice, and co-expression analysis indicates that they have a protective effect by inhibiting extracellular matrix production and promoting wound repair in alveolar epithelial cells.


Introduction
IPF is a type of interstitial lung disease with a poor prognosis and is associated with immune and in ammatory responses. It is characterized by persistent, progressive pulmonary brosis. Over 80% of IPF patients have an average survival period of 3-5 years after diagnosis. IPF primarily affects people over the age of 50, and its prevalence rises with age. There is a strong correlation between smoking or smoking history and the incidence of IPF. Despite extensive research, the etiology and pathogenesis of IPF are still not fully understood. Many hypothetical mechanisms, such as immune-mediated in ammation and malfunctioned alveolar epithelial cell (AEC) repairment, have been proposed in recent decades to contribute to the progression of IPF [1]. One of the most successful areas of IPF research has been the use of genome-wide studies to identify disease susceptibility motifs in familial and sporadic pulmonary brosis. Mutations in several genes, including surfactant proteins C (SFTPC) [2], surfactant proteins A2 (SFTPA2) [3], and mucin 5B (MUC5B) [4], have been identi ed as a driving factor for IPF development, and transcriptomic and proteomic studies have identi ed pathways and biological processes that may be involved in the IPF mechanism [5].
In addition to traditional genome analysis, research on the IPF transcriptome and proteome has provided new insight into the understanding of IPF's mechanism. Proteome analysis using two-dimensional gel electrophoresis and MALDI-TOF-MS on IPF lung tissues as early as 2011 identi ed up-and downregulated proteins in IPF, such as heat-shock protein 27 (Hsp27) [6]. Along with the development of novel experiment technologies, proteomes studies on IPF have been conducted using technologies such as twodimensional reversed-phase liquid chromatography and ion-mobility-assisted data-independent acquisition (HDMSE) [7], SOMAscan [8], and iTRAQ-Based LC-MS/MS [9,10]. The biological specimen used for the proteomics studies include lung tissues[6, 10], bronchoalveolar lavage uid (BALF) [7], and peripheral blood [8, 9,11]. These proteomics studies have detected DE proteins and identi ed biomarker for IPF, such as MMP7, AHSG, and VEGFR.
Although the application of whole transcriptomic technology in IPF is later than that of proteomics, several transcriptomic studies have provided new insights into the disease's mechanism. In 2018, transcriptomic analysis of IPF lung tissues revealed transcriptomic changes in normal-appearing and scarred areas [12]. In 2019, Sheu et al. investigated the expression changes associated with Nintedanib treatment in IPF Fibroblasts and identi ed down-regulated genes and associated pathways [13]. Sheu's team also identi ed dysregulated genes in IPF broblasts the same year [14].
As research on the proteomics and transcriptomics of IPF progresses, the need to integrate and analyze the comprehensive characterization of IPF gene expression by combining the two omics has gradually emerged. As a result, we designed the present experiment to jointly sequence and analyze the transcripts and proteomics of lung tissue samples from end-stage IPF patients. Our ndings demonstrated the differences and correlations between the characteristics of gene expression during the transcription and translation phases, as well as an overview of the non-coding RNA regulative network. We revealed the featuring pathological processes that occur during transcription and protein translation and identi ed promising new IPF-associated genes. Our research and efforts point in a new direction and provide guidance for unravelling the mystery of the IPF mechanism.

Material And Methods
Sample collection IPF lung tissues were collected from six end-stage IPF patients who underwent lung transplantation surgery at the First A liated Hospital of Guangzhou Medical University, Guangdong Province, China.
Healthy lung tissues were collected from ve lung doners. This study was approved by the ethics committee of The First A liated Hospital of Guangzhou Medical University (Reference number: 2018-92).
Signed informed consent was obtained from each patient. Diagnosis of IPF was con rmed by histology.

RNA-seq library construction and sequencing
Total RNA was extracted from the tissues using the Trizol (invitrogen) according to the manufacturer's protocol, and ribosomal RNA was removed using the Ribo-Zero™ kit (Epicentre, Madison, WI, USA).
Integrity of RNA was examined with the Bioanlyzer 2200 (Agilent). cDNA libraries were prepared using the Illumina TruSeq RNA Sample Preparation kit (Illumina). Fragmented RNA (the average length was approximately 200 bp) were subjected to rst-strand and second-strand cDNA synthesis following by adaptor ligation and enrichment with a low cycle according to instructions of NEBNext® Ultra™ RNA Library Prep Kit for Illumina (NEB, USA). The puri ed library products were evaluated using the Agilent 2200 TapeStation and Qubit®2.0 (Life Technologies, USA). The libraries were paired-end sequenced (PE150, Sequencing reads were 150 bp) at Guangzhou RiboBio Co., Ltd. (Guangzhou, China) using Illumina platform HiSeq3000.
Quality control of RNA sequencing reads Raw fastq sequences were treated with Trimmomatic tools [15] (v 0.36) using the following options: TRAILING: 20, MINLEN:25 and CROP:25, to remove trailing sequences below a Phred quality score of 20 and to achieve uniform sequence lengths for downstream clustering processes. Sequencing read quality was then inspected using the FastQC software [16]. Adapter removal and read trimming were performed using Trimmomatic. Sequencing reads were trimmed from the end (base quality less than Q20) and ltered by length (less than 25).

Quanti cation of mRNA expression
Paired-end reads were aligned to the human reference genome hg19 with HISAT2 [17]. HTSeq [18] (v0. 6.0) was used to count the reads numbers mapped to each gene. The whole sample's expression levels were presented as TPM (Transcripts Per Kilobase of exon model per Million mapped reads), which is the recommended and most common method to estimate the level of gene expression.

Differential expression analysis
The statistically signi cant DE genes were obtained by an adjusted p-value threshold of < 0.05 and |log2(fold change)| > 1 using the DEGseq2 software [19]. Finally, a hierarchical clustering analysis was performed using the R language package 'gplots' according to the TPM values of differential genes in different groups. And colors represent different clustering information, such as the similar expression pattern in the same group, including similar functions or participating in the same biological process.

GO terms and KEGG pathway enrichment analysis
All differentially expressed mRNAs were selected for GO and KEGG pathway analyses. GO was performed with KOBAS3.0 software [20]. GO provides label classi cation of gene function and gene product attributes (http://www.geneontology.org). GO analysis covers three domains: cellular component (CC), molecular function (MF), and biological process (BP). The differentially expressed mRNAs and the enrichment of different pathways were mapped using the KEGG pathways with KOBAS3.0 software.

Target mRNA prediction for DE lncRNAs
In this study, potential target genes for cis-or trans-acting of DE lncRNAs were predicted using different algorithms. Cis-acting target genes were identi ed by scanning the genome using ORF-nder [21] and BLASTP pipeline [22](e < 1×10-5). Protein-coding genes located within 10 kb the upstream or downstream of the lncRNA were obtained as cis-acting targets of the lncRNA. for the prediction of trans-acting target genes, mRNAs that have complementary sequences to lncRNAs were detected by BLASTN (e < 1×10-5), and then they were re-screened by the RNAplex tool [23].
Proteomic library construction and data acquisition For library generation by data-dependent acquisition (DDA), all 11 samples were pooled as a mixture and fractionated by high pH separation with 8 fractions. And all the samples were processed by dataindependent acquisition (DIA) individually to assess the proteome differences. First stage mass spectrometry (MS1) and second stage mass spectrometry (MS2) data were all acquired, and samples acquisition by random order. The iRT kit (Ki3002, Biognosys AG, Switzerland) was added to the samples to calibrate the retention time of extracted peptide peaks. Raw Data of DDA were processed and analysed by Spectronaut 14 (Biognosys AG, Switzerland) with default settings to generate an initial target list, which contained 94052 precursors, 87319 peptides, 9232 proteins, and 9119 protein group. Spectronaut was set up to search the database of uniprot-homo_sapiens.fasta database (version 201907, 20414 entries) assuming trypsin as the digestion enzyme. Carbamidomethyl (C) was speci ed as the xed modi cation. Oxidation (M) was speci ed as the variable modi cations. Q-value (FDR) cut off on precursor and protein level was applied 1%.

Proteomic analysis
Principal component analysis (PCA) was carried out separately on each data set using the R function 'prcomp()' from the package 'stats' with default parameters. Hierarchical Cluster Analysis (HCA) was processed with package 'pheatmap' (https://CRAN.R-project.org/ package=pheatmap). Volcano plot was drawn by using 'ggplot2' package [24]. The online tool of Metascape [25] was used to perform GO enrichment analysis. Pathway analysis was processed by KOBAS [20].

Multi-omics analysis
First, we collated the DE RNAs in transcriptomes and subdivided them into mRNA, miRNA, antisense RNA, lincRNA, and lncRNA (which is non-lincRNA and non-antisense). After that, we generated quantitative matrices of these RNAs and the DE proteins, where the RNAs were represented as normalized TPM, and proteins were represented as normalized quantitative signal intensity. Then the R packages 'mixOmics' (version 6.14.0) [26] and 'rgl' (version 0.105.12) were utilized to conduct the Data Integration Analysis for Biomarker discovery using a Latent cOmponents (DIABLO) analysis [27]. DIABLO is a multivariate integrative classi cation method that seeks common information and identi es key variables in multiple omics. Based on the analysis method of Partial Least Squares (PLS) and generalized canonical correlation analysis, DIABLO maximizes the common or correlated information between multiple omics datasets by selecting a subset of molecular features and discriminating between multiple phenotypic groups. The 'block.splsda' function in the mixOmics' package was used to integrate the omics and select key genes from each matrix via N-integration with sparse Discriminant Analysis. Then, the 'plotIndiv' function was used to provide scatter plots of the PLS -Discriminant Analysis (PLS-DA) analysis for each block, the 'plotDiablo' function was used to visualize the correlation between components from a different matrix, the 'circosPlot' function was used to display correlations between selected variable (i.e., RNAs, proteins) in different blocks in a circus, and the 'cimDiablo' function was used to generate a heatmap to represent the multi-omics molecular signature expression for each sample.
Classi cation and GO functional analysis of DE genes in transcription and translation We categorized the differentially expressed genes at the transcription and protein translation phases and studied the enriched pathways associated with each category of DE genes. We extracted all expression measurements from proteomics and transcriptomics, including log2 fold change (LFC) and p value, converted the gene IDs of the two matrices into consistent gene names, and merged the two matrices by the gene names. Using R language (version 4.0.3), we classi ed the genes based on their transcriptional and protein translational differences and plotted them in different colors. The cut-offs used for DE genes were p< 0.05, fold change > 2 for transcriptome expression and p< 0.05, fold change > 1.2 for protein levels. For the genes differentially expressed in both stages, we performed GO enrichment analysis using the Metascape tool [25].

DE analysis of public transcriptome datasets
Firstly, we searched the NCBI's GEO database [28] for high-quality transcriptomes from lung tissue of IPF patients. As a result, 91 datasets (52 IPF tissues vs. 39 healthy tissues) from four RNA-sequencing projects (GSE52463 [29], GSE83717 [30], GSE92592 [31], and GSE99621 [12]) were identi ed and downloaded using NCBI's sratoolkit (http://ncbi.github.io/sra-tools/, version 2.9.6-1). Secondly, the reads were ltered using the Trimmomatic tool [15] and were mapped to the human reference genome hg38 by STAR [32]. Then the transcript counts were calculated using the featureCounts software [33]. Then, the differential expression analysis was conducted by R package DEseq2 [19] following the standard protocol. The batch biases among different projects were controlled using the design function (design = project + status).

Bleomycin (BLM) IPF mouse model
Twenty-two C57BL/6 male mice were randomly divided into two groups: the IPF group (n = 9), and the control group (n = 13). BLM solution for use was prepared by dissolving 15mg BLM in 5mL 0.9% NaCl. Mice were anesthetized via intraperitoneal injection of 1% pentobarbital sodium (50mg/kg) and xed on the mouse plate. Either BLM (IPF groups) or saline (control group) 2.1mg/kg was administered into the glottis using a 100mL pipette. On day 21, the mice were sacri ced, and their lung tissues were collected for qPCR and Western blot.
Quantitative PCR (qPCR) RNA in tissues from human and BLM-induced mice was extracted by using the Trizol (Invitrogen®). The reverse transcription reaction was conducted following manufacturer's protocol (TaKaRa TM ). 5uL cDNA was mixed with 01uL primers and 10uL 2x SYBR Green qPCR SuperMix (QiaGen TM ) in a 20uL reaction.
PCR was performed in LightCycler® 480 II PCR system (Roche TM ). GAPDH was used as internal control.

Western blot
Tissues from human and BLM-induced mice were lysed with radioimmunoprecipitation (RIPA) lysis buffer (with phenylmethylsulfonyl uoride (PMSF)) (Beyotime Biotech TM ); the concentration of the protein solution was measured by the bicinchoninic acid (BCA) protein assay (KeyGene Biotech TM ). Protein was resolved by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and then transferred onto methanol pre-wet polyvinylidene di uoride (PVDF) membranes. After incubation with secondary antibodies, the PVDF membranes were mixed with enhanced chemiluminescence (ECL) substrate (Thermo Scienti c™), the intensity of light was detected by the Bio-Rad imaging system.

Co-expression network of BTNL9 and PLLP
To probe the possible function pathways of BTNL9 and PLLP, we generated a co-expression network for each based on the public IPF transcriptome datasets prepared in section 2.7. The GSEA software [34] was used to calculate the enrichment score for each gene sets following its o cial guide. The networks were visualized using the Cytoscape software [35] and the gene set clusters were annotated by the AutoAnnotate application [36].

Quality control of transcriptomes and proteomes
Quality control of RNA-seq reads. The libraries were constructed for the RNA sequencing, and deep sequencing was completed for the ten (six IPF vs. four control) samples that met the quality requirements. The samples had an average of 150,114,026 ± 15,174,766 sequence Reads and 22,517,103,960 ± 2,276,214,831 bases. There was no signi cant diff erence between the control group and the IPF group in the measured number of sequences and bases (p <0.05). The average base error rate was 0.53±0.098%, and there is no signi cant difference between the control group and the IPF group (p <0.05). The average GC ratio (GC%) was 47.15±1.80%, the percentage of Q20% bases (error rate < 1%) was 93.87±1.18%, and G30% bases (error rate < 0.1%) was 86.30±2.02%. There was no signi cant difference in these three indicators between the two groups (p <0.05). After data ltering, the average clean Q3 ratio is 89.25±1.52%, and the cleaning rate is 90.81±0.83% (Table 1).
The bases had a homogeneous distribution along with the sequences, the maximum error rate is < 1%, and the minimum base quality (-10×log 10 (error P)) was above 30 ( Figure S1, A-C). The sequence quality was further improved after data ltering ( Figure S1, D-F). Mapping quality of RNA-seq reads. 94.7±0.005% reads were successfully mapped to the human reference genome. The detected gene number approached saturation along with the increase of mapped reads, indicating a good sequence depth of this experiment. The averaged mapped genes of all the samples were 27,000 to 33,000 ( Figure S2A). The quality control and comparison results show that the sequencing results met the quality requirements for further analysis. Among the detected RNA sequences, 81.33% derived from exons, 16.1% from intronic, and 2.57% from intergenic reads ( Figure S2B). The detected genes were evenly distributed across chromosomes by comparison with the human gene distribution map ( Figure S2C).
Quality control and quanti cation of proteomes. Libraries were generated for all 11 samples that met the quality requirements and the proteins were detected and quanti ed. As a result, 9,119 protein groups and 9,232 proteins were detected at the QC level of 1% FDR (Spectrum, Peptide, and Protein levels). The levels of 7,823 protein groups and 7,932 proteins were quanti ed at the QC level of 1% FDR (precursor and protein levels). The average coe cient of variation (CV) of the precursors was 40.80% and 31.90% for the control and IPF samples, respectively. The median of the precursors' CV was 40.6% and 32.4% for the control and IPF samples ( Figure S3A). The recovery rate (the ratio of the identi ed proteins to the indicators in the human protein library) was 68.90% and 77% and the completeness (the ratio of the average number of identi ed proteins to the number of parent ions quanti ed in the experiment) was 51.90% and 63.50% for the control and IPF group samples, respectively. The cumulative recovery plot shows that 85% of proteins from the protein spectrum database have been detected in the 11 samples ( Figure S3B). The completeness plot shows that the total completeness of all samples was 83.8%, with 4,200 proteins identi ed in all samples ( Figure S3C). Consistency analysis of the qualitative results showed that 3,200 proteins were detected in all samples, and another 3,000 proteins were detected in more than half of the samples ( Figure S3D). The heat map of all the detected proteins shows no signi cant differences in the identi cation and quanti cation among all samples ( Figure S3E).
Transcriptome Analysis of IPF Differential gene expression. We identi ed the DE genes between samples through two cut-offs: log fold change (|log2(fold change)| > 1) and signi cance level (adjusted p<0.05). In comparison with normal lung tissue, a total of 2531 genes were signi cantly differentially expressed in the lungs of patients with endstage IPF, including 1772 up-regulated and 759 down-regulated genes ( Fig. 1.A). Clustered heatmap ( Fig.  1.B) shows that the control and IPF groups could be well separated by the genes, while the samples of IPF number 1, 4, and 5 had a clearer contrast with controls.
GO and KEGG enrichment analysis was performed to probe the biological processes and signaling pathways associated with the DE genes ( Fig. 1.C). IPF lung tissues had a signi cant enrichment of biological processes and functional pathways that dominate the mechanism of IPF when compared to control tissues. These enriched pathways in uence the progression of IPF at the biological process, cellular structure, and molecular function levels. Eight of the top ten pathways are related to the immune system activities and in ammatory response. Furthermore, there are also the pathways of the construction of ECM, which replaces normal Alveolar tissue and deposits abnormally in IPF [37,38].
Differential lncRNA expression. To investigate the regulatory impact of lncRNA in the end-stage IFP patients' gene expression, we rst quanti ed their expression and identi ed DE lncRNAs. The results showed that a total of 604 lncRNAs were signi cantly differentially expressed in IPF lung tissue, including 410 up-regulated genes and 194 down-regulated genes (Fig. 2.A). Clustered heatmap (Fig. 2.B) showed that the expression of these DE lncRNAs could separate the IPF samples from the control samples. And IPF samples 1, 4, and 5 showed a clearer contrast to the control samples than the other three IPF samples.
As the lncRNAs mainly function by regulating the protein-coding target genes, we predicted the potential target genes of cis-regulation and trans-regulation for the lncRNAs. We then performed GO enrichment analysis on the target genes and analyzed the results with the signi cance threshold of p< 0.05 (Fig. 2.C).
Most of the enriched pathways were associated with the structure and function of lung epithelial apical junction, such as apical junction assembly and tight junction assembly. This implies that the DE lncRNAs in IPF may mainly promote the process of epithelial-mesenchymal transition (EMT), cell migration, accelerated brosis progression, innate immunity, as well as cellular differentiation and proliferation [39,40]. Besides, there are also two pathways related to apoptosis, such as the cysteine-type endopeptidase activity involved in apoptotic process.
Proteomics Analysis of IPF Principal component analysis was performed on the protein expression data using the PLS-DA method, and the top 2 components were plotted in Fig. 3.A. The results showed that the end-stage IPF tissues were more concentrated on the graph compared with normal tissues, indicating a higher homogeneity of protein expression and a more consistent within-group expression pro le. We performed a Welch's ANOVA test on the protein quanti cations and de ned the DE proteins by a threshold of adjusted p< 0.05 and fold change > 1.5. As a result, we got 1,532 DE proteins in IPF tissues, including 1,231 up-regulated proteins and 301 down-regulated proteins (Fig. 3.B). Fig. 3.C shows the top 10 enrichment results under the three categories of Biological Process, Molecular Function, and Cellular Component. We note that these enriched gene clusters were mainly focused on the negative regulation of TOR and TORC1 signaling, which are associated with the decreased metabolism and protein production, autophagy, and extracellular matrix (ECM) production in end-stage IPF [41][42][43].
The DE proteins were signi cantly enriched in 13 KEGG pathways (p < 0.05) (Fig. 3.D). According to previous studies, ve of them are associated with the pathology of end-stage IFP. The RAS signaling pathway is associated with cell apoptosis and regeneration [44,45], the tight junction and gap junction are associated with cell regeneration and junction construction [39,40], the mTOR signaling pathway regulates cell growth and metabolism [41][42][43], and nucleotide excision repair is associated with wound repair [46,47].

Multi-omics analysis
By interactively analyzing the expression matrices of RNAs of different types and proteins, we identi ed the key genes of each type that drive the discrimination between IPF and control tissues and investigated the correlations between the ncRNAs and the expression of mRNAs and proteins.
Using the DIABLO method, we identi ed the genes contributing most to the discrimination between IPF and control tissues. These top-contributing genes include 20 mRNAs, 20 proteins, ten lncRNAs, ten lincRNAs, ten antisense RNAs, and ten miRNAs. These top-contributing genes include 20 mRNAs, 20 proteins, 10 lncRNAs, 10 lincRNAs, 10 antisense RNAs and 10 miRNAs. Only the top-10 genes were kept from three types of ncRNAs, this is because they each have relatively small gene numbers (from 59-279).
First, we display the discrimination of the IPF samples and control samples by the PLS-DA plot (Fig. 4.A).
In the PCA plots, the control samples and IPF samples 1, 2, 4, and 5 were clustered closely in all blocks, while the IPF samples 6 and 3 were at longer distances from the other IPF samples. Among the six blocks of the expression matrix, the IPF samples are more discrete in the mRNA and protein blocks, while they are more homogeneous in the ncRNA blocks. Fig. 4.B shows the correlation structure between components from each expression matrix. There are very strong associations between ncRNAs, mRNA, and proteins, and the correlation coe cients between any two datasets ≥ 0.98. The results indicate a good matrices design that favors the separation of the two groups.
Second, we created a clustered heat map representing the multi-omics pro les of all the samples (Fig.  4.C). The image shows that these top-contributing genes from six matrices well represent the separation of gene expression features of the IPF and control group. The only exception is the IPF sample 6, the expression characteristics of which are similar to neither the control nor the other IPF tissues. This result is consistent with the PLS-DA plot, in which IPF is also clearly discriminated from other IPF samples on the rst component (x-axis). As the most important contributors to the expression characteristics, the topcontributing proteins include ROM01, T22D3, MIS12, ZN384, LHPL2, TANC2, DESI1, MEA1, ARID2, NFRKB, PKP2, MTG1, RIPR2, ARHGP, DPOA2, GNB1L, YETS2, IKZF1, MBOA2, and CEP57.
Third, we produced a Circos plot displaying the relationship between and within the top-contributing genes from the six matrices, the cutoff for the correlation coe cient was set as > 0.9 (Fig. 5). The strong correlations between mRNAs, proteins, and the ncRNAs indicate a universal regulatory effect of these ncRNAs on mRNA transcription and protein translation. Compared to the proteins, mRNAs had more strong links with the regulatory ncRNAs. Among the ncRNAs, the lincRNA has the most links with protein and mRNA, suggesting a signi cant regulatory role in IPF. In this co-expression network, the mostcontribute variables are the lincRNAs ENST00000437698.1 and ENST00000442197.1, the antisense RNAs ENST00000519197.1 and ENST00000566738.1, the lncRNAs NR_110255.1 and NR_024344.1, and the miRNAs NR_030340.1 and NR_030408.1.

DE genes classi cation and functional enrichment analysis
Genes signi cantly differentially expressed at both transcriptional and translational phases were extracted and classi ed into four categories based on the trend they were regulated. The threshold of p value was set as < 0.05, and threshold of fold change was set as > 2 for transcriptome and > 1.2 for proteome. The results are displayed in the quadrant diagram ( Fig. 6.A).
Classi cation of DE genes. A total of 78 genes were differentially expressed in both transcriptome and proteome, and they were divided into four categories according to their regulation. 24 genes were signi cantly up-regulated in both omics, such as TUBB3, IGLV1-47, and CAPS. A total of 46 genes were signi cantly downregulated in both omics, including AGER, BNTL9, and RETN. Eight genes had opposite regulation trends, three genes were signi cantly down-regulated in the transcriptome but up-regulated in the proteome: CSK, RAC2, and SEMA5B. Five other genes were signi cantly up-regulated during transcription but down-regulated during translation: EPS8L1, GON7, HOMER2, IGLV8-61, and PROC.
GO enrichment analysis. The 24 genes were most frequently located on chromosomes 4 and 11, which had 3 and 4 genes, respectively. Twenty-one genes had four or more isoforms, suggesting that isoforms may be more active in the lung tissue of patients with severe IPF. These genes are signi cantly enriched in seven biological pathways and high-level GO terms (Fig. 6.B). The enrichment network shows that the enriched functions were clustered in the biological process of regeneration and cell morphogenesis involved in differentiation (Fig. 6.D). These over-activated pathways participated in the cell regeneration, differentiation, and intercellular sequential generation, which probably due to the deteriorated tissue damage and regeneration processes in the end-stages IPF patients [1,48].
These genes downregulated in both omics mainly locate on chromosomes 1, 9, 17, and 19. The genes were signi cantly enriched in 12 biological pathways and high-level GO terms (Fig. 6.C). The enrichment network shows that the most enriched terms were the biological process of myeloid leukocyte activation, regulation of IL-1 β production, cell-cell communication, cellular extravasation, and lipid localization (Fig.  6.E). These signi cant compromised functions and biological processes in the end-stage IPF lung tissues might be associated with reduced immune activities and the damage and obliteration of the alveolar tissue [49].

Identi cation of new biomarkers and veri cation
For the 78 genes we obtained in the previous step, we further checked their transcript expression in the 91 public transcriptome datasets. The detailed information about their expression feature is supplied in supplementary table. then, we identi ed 13 candidate genes that have signi cant differential expression in three omics ( Table 2). Literature review shows that many of these top-rank candidate genes have been reported involving in IPF, which strongly prove the e ciency of our research approach. Among these thirteen genes, three of them have been reported to have signi cant impacts on the pathology of IPF (S100A4 [50], STX11 [51], and THY1 [52]), while another three have been reported to have differential expression in IPF yet not veri ed (BTNL9 [53], SELENBP1 [54], and LIMD1 [55,56]). Eleven genes had no studies reported in IPF (ADGRL2, CA4, IGLV1-47, LIMCH1, MID1IP1, PLLP, and QDPR).
BTNL9 and PLLP expression in lung tissues from human and BLM-induced mice Based on the results of three omics and the review of literature research, we decided to validate the expression of BTNL9 and PLLP in IPF. both genes' mRNA transcriptions were quanti ed using qPCR, and their protein expression in lung tissues was detected by Western blotting. qPCR showed that both PLLP's mRNA expression signi cantly reduced in IPF patients (Wilcoxon test, p< 0.01) (Fig. 7. B); BTNL9's mRNA expression had a non-signi cant reduction (Fig. 7, A). Western blot showed that PLLP's protein expression decreased in IPF patients (Fig. 7, C).
On day 21 after BLM induction, the immunohistochemistry showed that, compared with the saline group mice, the lung tissues of BLM-induce mice showed progressive pulmonary brosis and alveolitis. The serum hydroxyproline increased, accompanied by increased expression of type I collagen (COL I) and Fibronectin in the lung tissue. These results indicated that the BLM-induced mouse lung brosis model has been successfully established. qPCR and Western blot were then used to measure BTNL9 and PLLP's expression in the BLM-induced mouse model. The results showed that BTNL9 and PLLP's mRNA expression signi cantly reduced in BLM-induced mice (Wilcoxon test, p< 0.05), and their protein expression also decreased in BLM-induced mice (Fig. 7, D-F).

Co-expression networks of BTNL9 and PLLP
The co-expression network demonstrates the promoted and inhibited gene sets associated with the expression of speci c genes. The co-expression network of BTNL9 shows that its expression is associated with the promotion of endothelium establishment, vessel endothelium migration, and construction of cell-cell junction. BTNL9's expression is associated with the inhibited pathways such as immune system activity, production of extracellular matrix, and cilium production (Fig. 8). The coexpression network of PLLP shows that its expression is associated with the promotion of endothelium development, cell membrane, and cell junction development. It is associated with the inhibited pathways such as abnormal respiratory function, immune system activity, and cilium production (Fig. 9).

Discussion
IPF is a rapidly progressive interstitial lung disease. IPF patients suffer rapidly deteriorating pulmonary brosis and their average survival time after diagnosis is 3 -5 years. IPF is now widely recognized as an interstitial lung disease characterized by abnormal myo broblast proliferation and extracellular matrix deposition caused by disturbance of the repair process of repeatedly injured aged lung epithelial cells.
However, current research is still some way from fully understanding the pathogenesis of IPF, and only two anti brotic drugs have been shown to have a therapeutic effect on IPF in clinical trials to date [57,58].
Over the last two decades, studies on IPF's whole-genomics, including gene mutation, transcriptomics, and proteomics, have provided new perspectives for understanding the pathogenesis and pathological process of IPF, identifying biomarkers for diagnosis and prognosis, and searching for new therapeutic targets [1]. In addition to traditional genomic analysis, recent gene expression studies have con rmed the role of long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are in the pathogenesis and progression of and IFP [59][60][61][62][63][64][65].
In this experiment, we depicted the expression characteristics and gene expression correlation network of IPF patients using an integrative analysis of the transcriptomes and proteomes of end-stage IPF patients' lung tissues. During transcription, DE genes in IPF patients were mainly enriched in immune-related pathways and some ECM-related pathways. DE genes in protein translation, on the other hand, were mainly enriched in biological functions and pathways associated with extracellular matrix production and deposition, such as negative regulation of TOR and TORC1 signaling, intracellular organelle part, and gap junction. This suggested that the upregulated transcription of immune-related genes might lead to the enhanced production of proteins associated with ECM production. The differences in gene expression characteristics during the transcriptional and protein translation phases indicate the signi cance of ncRNA's regulative impact in IPF. The multi-omics analysis of the proteomics and expression matrix of ve RNA subtypes revealed that non-coding RNAs are highly involved in the progress of IPF. Among them, lincRNAs have more correlation links to the mRNA and proteins. Antisense RNAs, lncRNAs, and miRNAs also had strong correlations. Due to the limitations of this study, we were unable to conclude speci c causal relationships between these interacting variables, which need to be investigated in the future using a new experimental design.
We further investigated the DE genes in both phases in attempting to identify novel biomarkers for IPF.
Twenty-two genes were signi cantly up-regulated during both transcriptional and translation phases, while 46 were signi cantly down-regulated. GO enrichment analysis revealed that in end-stage IPF patients, the most prominent processes were the enhanced AEC injury and repair, increased ECM production, and compromised immune activities.
Among the DE genes in both the transcription and translation phase, we examined the expression of BTNL9 and PLLP probed their possible roles in IPF. BTNL9 encodes the protein Butyrophilin Like 9 which is involved in cell-mediated immunity via the pathway of Class I MHC mediated antigen processing and presentation [66]. RNA-seq studies showed that it is down-regulated in IPF [67, 68] and chronic hypersensitivity pneumonitis [69]. Using qPCR and Western blot, we validate its reduced mRNA and protein expression in both IPF patients and BLM-induced mice. Co-expression analysis indicates that BTNL9 is associated with reduced immune response and might slow down IPF progression by inhibiting ECM production. It also might promote the wound healing of injured AEC by enhancing endothelium regeneration and cell-cell adhesion. PLLP encodes the plasmolipin which is involved in ion transportation and the formation of myelin sheath [70]. It might promote the differentiation of epithelial cells [71] and is downregulated in keratoconus epithelium of myopia patients [72]. Our study validated that PLLP is downregulated in the lung tissues from both IPF patients and BLM-induced mice. PLLP might protect the tissue by enhancing the development of endothelium, cell membrane, and cell-cell junction. Its downregulation is associated with impairment of respiratory function, which is consistent with previous observation in COPD patients [73]. These results indicate that both genes might play protective roles in IPF and their downregulation in IPF is associated with IPF progressions such as increased immune responses, ECM production, and impaired wound healing.
The authors acknowledge that this experiment has certain limitations. First, due to the rarity of the IPF incidence and the decreasing clinical application of biopsy in the diagnosis of IPF, the sample size of our study was relatively small, which might affect the credibility of our results. Second, although the endstage IPF patients were recruited following strict criteria, a certain degree of heterogeneity was still observed in sample six, which might have been caused by different phenotypic subgroups [74].
Nevertheless, the high homogeneity of the other samples might guarantee the credibility of our results. Third, although the expression and possible roles of BTNL9 and PLLP have been preliminarily probed, their functional pathways need to be further validated in future research.
In summary, in this study we sequenced and analyzed the transcriptomes and proteomes of end-stage IPF patients, portraying the landscape of end-stage IPF patient's whole-genome expression composed of DE genes, enriched biological processes, and the regulating networks. Based on this, we identi ed two IPF potential biomarker genes downregulated in both IPF patients and BLM-induced mice, BTNL9 and PLLP, which might protect against ECM production and promoting wound repair in alveolar epithelial cells. Our Availability of data and material: The authors declare that all data supporting the ndings of this study are available from the corresponding authors on reasonable request.
Code availability: The authors declare that code for data analysis in this study are available from the corresponding authors on reasonable request.   was log10 transformed and are displayed as colors ranging from red to blue as shown in the key. Both rows and columns are clustered using correlation distance and average linkage. C. Dot plot of signi cantly enriched gene sets. Each circle represents a term, the color is the log10 transformed the enrichment score, and the circle size is the number of genes that fall into the term.  Multi-omics analysis using DIABLO method. A. PLS-DA plot of expression matrices of proteins and RNAs.
The distances between samples represent the discrimination between two conditions, which were calculated based on the top-contributing genes in each gene type. The samples from different groups are represented with different colors. B. Correlation structure between the expression of different gene types.
Colors indicate the class of each sample. The number in the bottom left are the correlation coe cients between two expression matrices. C. Clustered heatmap of the mix-omics signature variables. Con1-Con5 represent the four control lung tissues, while IPF1-IPF5 represent the six IPF lung tissues. Both rows and columns are clustered using correlation distance and average linkage. This map displays the scaled expression of above the top-contributing genes from six matrices.     Co-expression network of PLLP. Red nods are enriched gene sets, and blue nodes are inhibited gene sets.
Node size represents the gene number of the gene set, and edge width stands for the similarity between gene sets.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.