Distinct gut microbial communities and metabolic functions are associated with cachexia in lung cancer patients

Cachexia is associated with decreased survival in cancer patients and has a prevalence of up to 80%. The etiology of cachexia is poorly understood, and limited treatment options exist. Here, we investigated the role of the human gut microbiome in the clinical setting by integrating shotgun metagenomics and plasma metabolomics of 38 lung cancer patients, with known cachexia status. The cachexia group showed signicant differences in the gut microbial composition, functional pathways of the metagenome, and the related plasma metabolites compared to non-cachectic patients. Branched-chain amino acids (BCAAs), methylhistamine, as well as vitamins, were signicantly depleted in the plasma of cachexia patients, which was also reected in the depletion of relevant gut microbiota functional pathways. The enrichment of plasma BCAAs and 3-oxocholic acid in non-cachectic patients were positively correlated with the gut microbial species Prevotella copri and Lactobacillus gasseri, respectively. Furthermore, the gut microbiota capacity for lipopolysaccharides biosynthesis was signicantly enriched in the cancer cachectic patients. The involvement of gut microbiome in cachexia was further observed in a high-performance machine learning model that uses solely gut microbial taxonomic and pathway features to differentiate cachectic from non-cachectic cancer patients.


Background
Cachexia is a multifactorial disorder frequently observed in cancer patients, characterized by weight loss, muscle wasting, adipose tissue changes, physical dysfunction, and appetite loss (anorexia) induced by in ammation and abnormal metabolism [1]. The presence of this syndrome limits the treatment options for cancer patients and leads to a decrease in the quality of life and survival [2]. Cachexia presents to varying degrees in cancer patients depending on the cancer type [3], with the highest incidences in gastrointestinal (80%) [4] and lung (60%) cancer patients [5]. Although the underlying etiology of cachexia is not fully understood, the cytokine physiology has been suggested in both mouse models and humans. A decrease in anabolic factor insulin-like growth factor-1 and the increase in in ammation-related catabolic factors such as interleukin (IL-6), interferon-gamma (IFN-γ) and tumor necrosis factor-alpha (TNF-α) was observed in cachectic patients [6]. Additionally, the level of lipopolysaccharide-binding protein (LBP) in serum has been suggested as a biomarker of cachexia [7].
Several approaches have been proposed in the treatment of cancer cachexia, including targeting catabolic factors, appetite stimulation, and muscle regeneration; however, these have had limited salutary effects [8]. Nutrients such as omega-3 fatty acids (eicosapentaenoic acid and docosahexaenoic acid) to mitigate in ammation, and leucine and milk proteins to promote protein synthesis have also been suggested as potential treatment options [9]. However, no single therapeutic approach is su cient to treat this multifactorial disorder and multimodal therapy considering nutrition, exercise, and pharmacological agents are likely needed [8,10].
The gut microbiota is gaining attention as a new target for cachexia treatment, due to its critical role in providing depleted nutrients, modulating gut hormones, cachexia related cytokines and improving gut barrier function [11]. Furthermore, the gut microbiota has been associated with different disorders including those that share symptoms with cachexia, such as anorexia [12], malnutrition [13], and chronic fatigue syndrome (CFS) [14]. A recent study from Potgens et al. has investigated gut microbiota in cachectic mice with colon carcinoma and linked cachexia successfully with Klebsiella oxytoca, a speci c gut bacterial species involved in altering gut barrier function [15]. From the aspect of reversing cancer cachexia, a particular strain, Faecalibacterium prausnitzii A2-165 (DSM 17677), has been used in cachectic mice with colon carcinoma. However, it did not modify the gut permeability, and no biomarkers of gut barrier function were altered [7]. However, all studies to date are limited to murine models with either colon cancer or leukemia, and the analytical approaches to disentangle microbiome composition were all based on 16S rRNA gene sequencing, a less informative or sensitive methodology in comparison to shotgun metagenomic sequencing [7,15,16].
Here, we performed an in-depth analysis of the plasma metabolome, the gut bacterial taxonomy and functionality in 31 human lung cancer patients by applying untargeted metabolomics to patient plasma samples and shotgun metagenomics to collected stool samples. Speci c metabolites, intestinal microbial species and their metabolic pathways were associated with cachexia status. In order to get a comprehensive picture of the role of the gut microbiome in cachexia, we subsequently integrated the taxonomic and functional signatures with metabolomics data. A machine learning classi er of cachectic and non-cachectic patients, with the combinatorial effect of microbiota features taken into account, was also developed and further supported the putative role of gut microbiota. Here we aim to identify the microbiome associations with cachexia to open up the way for new therapeutic options for this critical medical condition that in uences cancer treatment outcomes.

Results
Cachexia affect survival probability of lung cancer patients Thirty-one lung cancer patients, 12 women and 19 men, were enrolled at National Koranyi Institute of Pulmonology (Budapest, Hungary) and at the County Hospital of Torokbalint (Torokbalint, Hungary). The patients were classi ed as A (well nourished, n = 19), B (moderately or suspected of being malnourished, n = 8), or C (severely malnourished, n = 4) based on the abridged Patient-Generated Subjective Global Assessment (aPG-SGA) [17]. We merged groups B and C, referred to herein as the cachexia group (n = 12), whereas patients classi ed as A served as the non-cachexia group (n = 19). There were no signi cant differences in the distribution of four lung cancer subtypes or cancer stage between the two groups (p > 0.05 for both subtype and stage, Fisher's exact test, Supplementary Table S1). As expected, the cachexia group has a signi cantly lower body mass index (BMI) compared to non-cachectic patients (p = 5.7e-08, Wilcoxon rank-sum test) (Fig. 1A). Univariate survival analysis demonstrated that the cachexia patients have signi cantly lower survival probability (vs non-cachexia, p = 0.0051, Log-rank test) (Fig. 1B); furthermore, there were signi cantly increased survival in patients with SGA scores A compared to B or C, (p = 0.0019, Log-rank test) (Fig. 1C).

Lower level of plasma BCAAs in cachexia
To characterize the plasma metabolites pro le in cachexia in a clinical setting, we collected plasma samples from our patients that were subject to untargeted metabolomics analysis utilizing ultra-highperformance liquid chromatography-quadruple time-of-ight mass spectrometry (UHPLC-QTOF-MS). In total, more than 5000 metabolite features were captured, of which 314 common metabolites were identi ed in a semi-targeted manner. Multi-variate statistical analysis shows that the metabolomic pro les of cachexia and non-cachexia patient groups are clearly distinguishable (p = 0.026, ANOSIM), with comparatively scattered cachectic samples observed, suggesting higher variability in the cachectic patients ( Fig. 2A). The metabolite classes, such as amino acids, vitamins, and indoles were signi cantly depleted in cachectic lung cancer patients (Fig. 2B).
In total, 41 individual metabolites were identi ed as differentially abundant between the two groups (p < 0.05, Student's t-test) ( Figure S1). Overall, essential amino acids, such as isoleucine, leucine and tryptophan were signi cantly more abundant in non-cachectic patients ( Fig. 2C and S2). Low serum cholesterol level has been previously suggested as a biomarker for malnutrition [18]. In our data, there was no difference in serum cholesterol level between the two groups (p = 0.774, Student's t-test). This may imply that the depletion of serum essential amino acids in the cachectic group likely involved other factors, e.g. gut microbiota, rather than differences in dietary intake. Consistent with leucine and isoleucine, another member of branched-chain amino acids (BCAAs), valine, was also found in a lower amount in cachectic patients, but was not statistically signi cant (p = 0.103, Student's t-test). Of note, the depletion of plasma BCAAs has also been shown in children with severe kwashiorkor (malnutrition caused by a lack of protein in the diet). In contrast, an increase of plasma BCAAs levels has been observed in type 2 diabetes (T2D) and obese subjects compared to healthy people [19], highlighting the high relevance of plasma BCAAs to metabolic balance. Accordingly, the use of leucine in representation of BCAAs has been previously suggested as dietary supplementation for tackling cachexia [9]. In comparison, pipecolic acid, a non-proteogenic cyclic amino acid produced during the degradation of lysine, was the only amino acid signi cantly enriched in our cachectic patients (p < 0.05, Student's t-test). This may result from the excessive degradation of lysine due to increased protein degradation and decreased protein synthesis. The level of pipecolic acid has been reported to be elevated in patients with liver cirrhosis and hepatocellular carcinoma [20].
In summary, untargeted metabolomics revealed key circulating plasma metabolites in cachectic lung cancer patients that may have potential clinical relevance in cachexia syndrome development or progression. Alteration of blood metabolites might be associated with gut microbiota and their metabolic pathways, as demonstrated before [21].
Cancer cachexia patients have a distinct gut bacterial pro le Next, we analyzed the change of gut microbiome according to cancer cachexia using 31 fecal samples collected from our lung cancer patients. Bacterial DNA was isolated from the fecal samples and used for shotgun metagenomic sequencing at an average depth of 6 Gbp. We compared the gut microbiome composition between cachexia and non-cachexia, and observed no differentially abundant phyla between the two groups ( Fig. 3A). Regarding microbiota community diversity, no signi cant difference was observed in alpha-diversity between cachexia and non-cachexia patients (p > 0.05, Wilcoxon rank-sum test) (Fig. 3B). However, principal coordinate analysis (PCoA) based on Bray-Curtis dissimilarities revealed signi cantly different microbiota compositions between the two groups (p = 0.001, ANOSIM) ( Fig. 3C). Additionally, in concordance with the aforementioned plasma metabolomics data, the cachexia group showed signi cantly higher variability of gut microbiota composition than the non-cachexia group ( Figure S3). No signi cant associations were found between overall microbiota compositions and cancer stage in our cohort (p > 0.05, PERMANOVA). Subsequently, we compared the bacterial species composition of our lung cancer samples with a large healthy European cohort (n = 471, Dutch) [22]. The cachexia group was placed distinctly from other groups (non-cachexia cancer patients or healthy individuals) in the ordination plot (p = 0.001, ANOSIM) (Fig. 3C). The dissimilarity between the cachexia group and healthy lean people also re ected the complexity of the gut microbiota structure of cachexia patients rather than merely resembling that of lean people. We assessed the Firmicutes/Bacteroidetes (F/B) ratio which has been hypothesized to be lower in cachexia patients due to its reported association with obesity and BMI [23,24], but observed no signi cant difference between the two groups in our cohort (p = 0.1196, Wilcoxon rank-sum test). Moreover, no signi cant positive correlation was found between the F/B ratio and BMI in our lung cancer cohort (p = 0.5747, rho = 0.1044, Spearman's rank correlation).
Next, we focused on comparisons at the species level and identi ed fty-one differentially abundant bacterial species between the two groups (p < 0.05) ( Figure S4), most of which (n = 44) remained signi cant after multiple hypothesis testing corrections (FDR-corrected p < 0.05). A total of 13 signi cant species (Fig. 3D) were also prevalent (higher than 20%) among all patients, the vast majority of which were more abundant in the non-cachexia group. Prevotella copri showed signi cantly lower abundance in cachectic patients (FDR-corrected p = 0.006), in which the depletion of plasma BCAAs was observed.
Notably, P. copri has been associated with enhanced gut microbiota biosynthesis and circulating levels of BCAAs [25]. Klebsiella oxytoca, a species previously associated with cancer cachexia in mice [15], was found to be signi cantly higher in lung cancer patients with cachexia (p = 0.013, FDR = 0.052), though with low prevalence in this human-based cohort. Next, we analyzed Faecalibacterium prausnitzii, a gut bacterium with anti-in ammatory and gut barrier-enhancing properties [26,27], which as a treatment option did not improve the gut permeability or the gut barrier function of cachectic mice [7]. Importantly, in our human study, F. prausnitzii was signi cantly more abundant in non-cachectic patients, though detected only by Wilcoxon rank-sum test (p < 0.05). Further strain-level analysis for F. prausnitzii showed that another strain M21/2 had a higher difference between non-cachexia and cachexia patients (p = 0.101, Wilcoxon rank-sum test), as compared with the strain A2-165 investigated before [7], suggesting the potential of alternative strains in future treatment ( Figure S5).
Previous studies have revealed the considerable effects of gut microbiota on blood metabolite pro les [21,28]. In our lung cancer cohort, we also observed a signi cant correlation between the overall plasma metabolome and the gut microbial species (p = 2e-04, r = 0.3433, Mantel test). To further disentangle the interplay between gut microbiota taxonomy and plasma metabolite pool, we correlated the signi cantly differential abundant microbial species and metabolites between cachexia and non-cachexia patients. The plasma level of isoleucine, a member of BCAAs, was signi cantly positively correlated with the abundance of P. copri (Fig. 3E), as demonstrated before [25]. The 3-oxocholic acid was more abundant in non-cachexia patients and had a positive correlation with gut species Lactobacillus gasseri, a potential probiotic [29]. Accordingly, L. gasseri was also enriched in the non-cachexia group (vs cachexia, p = 0.021, FDR = 0.082) (Fig. 3D). These results further support the association between the gut microbiome alterations and circulating plasma metabolites that may have substantial clinical implications in cachexia syndrome development or progression.
Alteration of gut microbiota metabolic pathways associated with cachexia The use of shotgun metagenomic sequencing also enabled us to further examine the variation of gut microbiota functions according to cachexia in lung cancer patients. Using the MetaCyc pathway abundances based on UniRef90 gene annotation results, we observed no signi cant differences of functional alpha diversity between cachectic and non-cachectic patients (p = 0.48 and p = 0.86, Shannon and Simpson index, respectively, Wilcoxon rank-sum test). In contrast, we found a signi cant difference in microbial community functional pro les (Bray-Curtis dissimilarities calculated from MetaCyc pathway abundances) between the two patient groups (p = 0.035, ANOSIM). Furthermore, the overall microbiota functional pro les of the cachexia group were more variable than those of the non-cachexia group ( Figure  S6), in accordance with the metabolomic and gut taxonomy pro ling.
By directly comparing the overall abundances of pathways, we found that catabolic pathways of certain complex carbohydrates (starch, mannan) and sugar derivatives (glucuronide, fructuronate, myo-, chiroand scillo-inositol), as well as anabolic pathways of several amino acids, were signi cantly lower in the cachexia patient group compared to the non-cachexia group (p < 0.05, Wilcoxon rank-sum test) ( Figure  S7). Such decreased gut microbiota biosynthesis of amino acids under cachexia, including isoleucine, threonine, serine and glycine, is in agreement with our plasma metabolomics-based nding aforementioned, especially for BCAAs. Next, we performed KEGG pathway enrichment analysis using GAGE [30], an approach that identi es concordant changes of the genes present in a particular pathway.
As a result, purine and methane metabolism pathways were enriched in the cachexia group (Fig. 4A). In line with our ndings, the alteration of purine metabolism has been observed in the gut microbiota of human after body weight loss induced by Roux-en-Y gastric bypass, as well as in the comparison between older and younger people that have different muscle mass [31]. Methane may reduce appetite by direct stimulation of intestinal hormone Glucagon-Like Peptide-1 (GLP-1) [32]. Of note, the methanogen Methanobrevibacter smithii was identi ed as the signature species of our cachectic group (p < 0.05, IndVal test) and has been associated with anorexia, metabolic abnormalities [33] and chronic constipation [34]. In addition, heterolactic fermentation, which was found enriched in cachectic patients ( Figure S7), might be highly relevant to methanogenesis, as lactate is the most favorable substrate for methanogens [35]. To further assess the credibility of our pathway analysis, we next investigated functions with known involvement in cachexia. A recent study has identi ed lipopolysaccharide-binding protein in the serum to be a new biomarker of cancer cachexia [7]. Lipopolysaccharide (LPS) is a type of proin ammatory bacterial compound that can cause reduced intestinal barrier function and increase its translocation upon gut barrier alteration [36], as well as inducing muscle catabolism mediated by Toll-like receptor 4 (TLR4) [37]. Our analysis con rmed the signi cant enrichment of the microbiota LPS biosynthesis pathway in the cancer cachectic patients versus non-cachectic patients (p < 0.05) (Fig. 4A).
Given the differential abundances of speci c carbohydrate degradation pathways, we then compared the abundances of carbohydrate-active enzymes (CAZy) in the two patient groups. At a high level in the CAZy hierarchy, all CAZy classes were more abundant in the cachexia group ( Figure S8), although not statistically signi cant (p > 0.05, Wilcoxon rank-sum test). Across all samples, 439 CAZy families were detected. Twenty-nine CAZy families, including 18 enriched and 11 depleted in cachectic patients, were found signi cantly differentially abundant (p < 0.05, Wilcoxon rank-sum test), the majority of which belong to the glycoside hydrolases class.
We have identi ed several microbial species to be correlated signi cantly with differential abundant plasma metabolites (Fig. 3E). Interestingly, the functional pathways of gut microbiota were found to have more and stronger correlations with those metabolites than microbial species did (Fig. 4B). This implies that the in uence of gut microbiota on the plasma metabolites pro le was more likely through the combinatorial effects of multiple bacteria, or microbial consortium, rather than individual microbial species. Furthermore, it highlights the importance of microbiota functions in interacting with plasma metabolome and affecting host phenotypes, including cachexia. Additionally, the positive correlation between plasma level of L-isoleucine and the gut microbial pathway "PWY-5101: L-isoleucine biosynthesis II" (p = 0.010, r = 0.509, Spearman's rank correlation) suggests the impact of the biosynthesis of amino acids in the gut to the plasma amino acid levels. The plasma level of methylhistamine was correlated with a range of gut microbiota pathways and signi cantly enriched in the non-cachectic patient group (Fig. 4B). Similarly, the plasma levels of vitamins, such as B vitamins pyridoxal and pyridoxamine (both lower in cachectic group), also showed multiple correlations with microbiota functions and species ( Figure S9).
In summary, our results highlight the distinct gut microbiota functional capacity in cachectic patients and the close relationship between gut microbial functions and the plasma metabolites in cachexia.
Gut microbiota features as a proxy of cachexia status Next, we built a 5-fold cross-validation Random Forest classi cation model using microbiota-derived features to further test the associations between human gut microbiome and cachexia in a clinical setting. We also sought to verify the relevance and robustness of identi ed differentially abundant taxa and pathways. In our European cohort (non-cachexia [n = 19], cachexia [n = 12]), the AUC of model based on differentially abundant species (n = 51) was only 0.577, while using the differentially abundant MetaCyc pathways as features (n = 27) improved the performance (AUC = 0.849) (Fig. 4C). With combined features of species (n = 51) and MetaCyc pathways (n = 27), the AUC reached 0.875. This highperformance machine learning model, which takes into account the complex microbial interactions, further suggests the essential role of gut microbiota in cachexia development. This model has helped to identify a group of microbial species and functions ( Figure S10) whose combinatorial effects are probably associated with cachexia in lung cancer patients and potentially could guide the gut microbiota modulation for preventing/treating cachexia. Presumably, a simpli ed model where a single bacterium can have a profound effect on cachexia status may not be su cient, considering the high complexity behind this disorder. This complexity might be a possible reason why the supplementation of Faecalibacterium prausnitzii for cachexia treatment in mice was not successful [7]. The hybrid model had a moderate performance (AUC = 0.7) when applied to an independent validation cohort of seven lung cancer patients (non-cachexia [n = 5], cachexia [n = 2]) recruited in a US clinic, which could also be attributed to microbiome differences between the US and our European patients independently of cachexia status.

Discussion
Cancer cachexia is associated with worse performance status and frequently limits oncotherapy administration. To date, we have no effective therapy to prevent cachexia. Using 16S rRNA sequencing, a less advanced technique compared to shotgun metagenomics used in our work, recent studies have shown the involvement of the microbial community in cachexia by analyzing the gut microbiome in murine models of colon cancer and leukemia [7,16]. Here, we performed shotgun metagenomic sequencing and plasma untargeted metabolomics in a cohort of cancer patients, aiming to identify gut bacteria species and metabolic functions that are associated with cachexia. Through a comprehensive and integrative analysis of these -omics data, we disentangled multiple links among gut microbial species, functions, and plasma metabolites, which may collectively and ultimately contribute to the development of this complex and multimodal disorder. Importantly, our ndings not only provide clinical evidence between gut microbiota and cachexia but also con rm the previous results from preclinical animal models. The critical role of the gut microbiome in cachexia was further con rmed by a machinelearning model taking into account complex combinatorial effects of gut microbiota features. This model achieved high accuracy in discriminating patients with different cachexia status both in training and in an independent validation cohort.
Despite our increased understanding of cachexia, previous work into gut microbiota was based on mice models that cannot fully recapitulate human cancer cachexia [38]. The lack of appropriate mice models and the differences in the complexity of the human and mice metabolic pathways hindered the more indepth investigation and the effect of potential intervention. Very recently, Talbert et al. [38] have developed a mouse model named KPP that can better model the cachexia experienced by cancer patients. This model can be used to further validate our ndings in the future, or even with another model that more closely resembles the cachectic lung cancer patients. A possible limitation of our study is that we included only lung cancer patients. The relatively low sample size might also contribute to the nonsigni cant difference in Firmicutes/Bacteroidetes ratio. A very recent study has demonstrated poor similarity in gut microbial taxonomic abundances between human and mice after fecal transplantation [39]. This further highlights the importance of investigating the relationship between cancer cachexia and gut microbiota in more human clinical samples. Despite that, our investigation into gut microbiota and plasma metabolome in cachectic lung cancer patients were able to achieve consistent ndings with those preclinical studies and the studies linking gut microbiota with features of cachexia such as body weight loss, low muscle mass and low appetite [31].

Conclusions
Our study offers a snapshot of gut microbiota and plasma metabolome alteration in lung cancer patients with cachexia. To our knowledge, this is the rst endeavor to investigate the role of gut microbiota in cachexia in the clinical setting. It is expected to pave the way for relevant clinical research in the future to attenuate, prevent or treat cachexia. More importantly, our study shed substantial light on the possible clinical targets for tackling cancer cachexia. Future nutritional supplements shall include both amino acids and bile-acids such as methylhistamine and 3-oxocholic acid. From the microbiota point of view, it might be bene cial to use treatments that can reduce gut in ammation and restore gut barrier function disrupted by increased LPS production. Microbial cocktails or probiotics containing mixtures of bene cial species identi ed in this study, such as Lactobacillus gasseri and Prevotella copri might be further tested in the future. Another clinical aspect is the fecal microbiota transplantation to restore healthy microbiota that might also be a possible future approach to assess the clinical importance of gut microbiota in cachexia. Lastly, combinations of different modes of therapy may be more effective due to the metabolic complexity of this disorder. Future prospective studies are needed to con rm these ndings.

Methods
Ethics Statement. Our study was performed in accordance with the guidelines of the Helsinki Declaration of the World Medical Association. The national level ethics committee (Hungarian Scienti c and Research Ethics Committee of the Medical Research Council (ETTTUKEB-50302-2/2017/EKU)) o cially approved the study. All patients recruited were consented to the study. The clinicopathological information was collected, then patient identi ers were removed, and afterwards, patients cannot be identi ed either directly or indirectly.
Study population. In total 31 lung cancer patients (12 female and 19 male) were enrolled between 2017 and 2018 at National Koranyi Institute of Pulmonology, Budapest, Hungary and at the County Hospital of Pulmonology, Torokbalint, Hungary (Supplementary Table S1). We included patients with histologically con rmed adenocarcinoma (ADC) (n=16), squamous cell carcinoma (SCC) (n=10), non-small cell lung carcinoma not otherwise speci ed (NSCLC-NOS) (n=1) and small cell lung carcinoma (SCLC) (n=4). The 58% (n=18) of the patients included were diagnosed with advanced stage disease (Stage IIIB/IV). Clinical TNM (Tumor, Node, Metastasis) stage according to the Union for International Cancer Control (8th edition), and age at the time of diagnosis were recorded. Patients were scored A (n=19), B (n=8) and C (n=4) based on abridged Patient-Generated Subjective Global Assessment (aPG-SGA) [17]. The SGA scores were measured based on BMI, weight changes, food intake, symptoms of eating (appetite), and functional capacity. Clinicopathological data included gender, age, stage, and overall survival (OS). OS was calculated from the time of diagnosis until death, or last available follow-up. Date of the last followup included in this analysis was February 2019.
Treatments. All treatments across all centers were conducted in accordance with contemporary National Comprehensive Cancer Network guidelines.
Schedule of sample collection procedures. Stool and blood baseline samples were obtained at the same time point before the initiation of systemic therapy after signed informed consent was obtained. All samples were placed on the day of collection in the -80°C freezer.
US validation cohort information. Stool samples were collected from a human lung cancer cohort of 7 individuals (Supplementary Table S2) at Western Regional Medical Center, Goodyear, Arizona, USA, after signed informed consent under a protocol approved by the Western Institutional Review Board (WIRB protocol number 20140271, Pallyup, Washington, USA). Bacterial DNA was subject to Illumina PE 150-bp whole metagenome sequencing. The sequenced reads were processed using the same approach as the EU Hungary cohort.
Plasma metabolomic analysis. Untargeted metabolomics pro ling of patient plasma samples was performed by Afekta (Kuopio, Finland), as detailed below.
Sample preparation. The plasma samples were prepared as follows: an aliquot of the sample, 100 μL, was mixed with 400 μL of acetonitrile and mixed by pipetting. The samples were placed on a 96-well lter plate, which was centrifuged at 700 × g for 5 min at 4 °C. Small aliquots were taken from each sample, mixed together in a single tube, prepared in an identical way to the other samples, and used as the quality control (QC) sample in the analysis. The fecal samples were prepared as follows: 300 μl of cold 80% aqueous methanol was added per 100 mg of sample into homogenizer tubes. The sample preparation procedures were performed on dry ice with cooled instruments. The samples were homogenized with Bead Ruptor 24 Elite (OMNI International) with Heart program (6 m/s, 30 s). Next, the samples were vortexed for 10 s and centrifuged at 13000 rpm and 4 °C for 10 min. The supernatant was collected on a 96-well lter plate, which was centrifuged at 700 × g for 5 min at 4 °C. The QC sample was prepared in the same way as for the plasma samples.
LC-MS analysis. The samples were analyzed by liquid chromatography-mass spectrometry consisting of a 1290 In nity Binary UPLC coupled with a 6540 UHD Accurate-Mass Q-TOF (Agilent Technologies), as described previously [40]. In brief, a Zorbax Eclipse XDB-C18 column (2.1 × 100 mm, 1.8 μm; Agilent Technologies) was used for the reversed-phase (RP) separation and an Acquity UPLC BEH amide column (Waters) for the HILIC separation. After each chromatographic run, the ionization was carried out using jet stream electrospray ionization (ESI) in the positive and negative mode, yielding four data les per sample.
The collision energies for the MS/MS analysis were selected as 10, 20 and 40 V, for compatibility with spectral databases.
Data analysis. The data analysis was performed separately on each of the four modes and sample type combinations, resulting in total 8 preprocessing runs. The analysis was conducted in R version 3.5.0 using in-house scripts. Signals with too many missing values were removed by requiring a measured value in at least 60% of the samples in at least one of the study groups. The signals were corrected for the drift pattern caused by the LC-MS procedures. Regularized cubic spline regression was t separately for each signal on the QC samples. The smoothing parameter was chosen from an interval between 0.5 and 1.5 using leave-one-out cross validation to prevent over tting. The performance of the drift correction was assessed using non-parametric, robust estimates of relative standard deviation of QC samples (RSD*) and D-ratio* as quality metrics. Drift correction was only applied if the value of both quality metrics decreased, leading to enhanced quality. Otherwise, the original signal was retained. After the drift correction, low quality signals were removed. Signals were kept if their RSD* was below 20% and their Dratio below 40%. In addition, signals with classic RSD, RSD* and basic D-ratio all be-low 10% were kept. This additional condition prevents the removal of signals with very low values in all but a few samples. These signals tend to have a very high value of D-ratio*, since the median absolute deviation of the biological samples is not affected by the large concentration in a handful of samples, causing the Dratio* to overestimate the signi cance of random errors in measurements of QC samples. Thus, other quality metrics were applied with conservative limit of 0.1 to ensure that only good quality signals were kept this way. Missing values were imputed using random forest imputation. Signals were then normalized using inverse-rank normalization, to approximate a normal distribution. QC samples were removed prior to imputation and normalization, to prevent them from biasing the procedures.
Compound identi cation. The chromatographic and mass spectrometric characteristics (retention time, exact mass, and MS/MS spectra) of the signi cantly differential molecular features were compared with entries in an in-house standard library and publicly available databases, such as METLIN and HMDB, as Metagenomic sequencing and read quality control. To examine the gut microbiome of our lung cancer cohort, fecal samples were collected from 31 lung cancer patients at diagnosis, before the initiation of oncotherapy (baseline). Bacterial DNA was isolated from the fecal samples for shotgun metagenomic sequencing. Sequencing was performed using Illumina HiSeq 4000 with PE150 at an average depth of 6 Gb. The sequenced reads were processed with quality control to remove the adapter regions, low quality reads, and human DNA contaminations (bwa (version 0.7.4-r385) mem against human reference genome ucsc.hg19) following the previously described steps [41]. Approximately 95% of the reads remained after the quality control.
The 471 metagenomic data from the 500FG project were used as European healthy control in the taxa comparison [22]. The taxonomic pro les of these 500FG samples were acquired by using R package curatedMetagenomicData (R 3.5.1, curatedMetagenomicData 1.13.3 package) [42].
Microbial taxonomic pro ling and community diversity analysis. The high-quality reads were taxonomically pro led using MetaPhlAn2 [43] with default settings. The differentially abundant taxa were identi ed using the Wald test implemented in the R package DESeq2 [44] v1.22.2 on the unrare ed relative abundance data, and the statistical signi cance was ltered with FDR-corrected p <0.05 unless otherwise stated. The alpha-diversity (Shannon index) of each sample and beta-diversities (Bray-Curtis dissimilarities) among samples were calculated with VEGAN (v2.5.3) [45] based on rare ed data.
Rarefaction was applied to the abundance table in estimated mapped reads to the depth of the less abundant sample in order to equalize the depth among the samples. To test the difference in the microbial composition between two or more groups, ANOSIM (analysis of similarities) was employed based on the Bray-Curtis dissimilarity. For Faecalibacterium prausnitzii strain abundance comparison, the high quality reads were further taxonomically classi ed by using Kaiju [46], which is a protein-level classi cation tool, with the microbial subset of the NCBI BLAST non-redundant protein database nr was used.
Assembly-free functional annotation. The high-quality reads after the quality control were processed by using HUMAnN2 (Franzosa et al. Nat Methods. 2018). In the pipeline, the reads were mapped to the database of UniRef90 gene families, and then the gene families were regrouped to MetaCyc reactions and KEGG Orthologs (KOs) for pathways annotation. The quanti ed pathway abundances in the units of RPKs (read per kilobase) were normalized to copies per million (CPM) units by the provided script for further analyses. KEGG pathway enrichment analysis was performed using GAGE [30].
De novo assembly and CAZy annotation. The high quality reads after the quality control were further assembled using IDBA-UD [47] with k-mer size ranging from 20 to 150 bp. The coding DNA sequence     (C) Receiver operating characteristic (ROC) curve plots of Random Forest models based on the gut microbial taxonomic and pathway features of cancer patients for differentiating lung cancer patients with and without cachexia.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryInformationv1.7.docx