Using Faecal Immmunochemical Cartridges for Gut Microbiome Analysis within a Colorectal Cancer Screening Program

Background: The colorectal cancer (CRC) screening program B-PREDICT is an invited two stage screening project using a faecal immunochemical test (FIT) for initial screening followed by a colonoscopy for those with a positive FIT. Since the gut microbiome likely plays a role in the etiology of CRC, microbiome-based biomarkers in combination with FIT could be a promising tool for optimizing CRC screening. Therefore, we evaluated the usability of FIT cartridges for microbiome analysis and compared it to Stool Collection and Preservation Tubes. Methods: Samples were collected from participants of the B-PREDICT screening program using FIT cartridges as well as Stool Collection and Preservation Tubes and 16S rRNA gene sequencing was performed. We calculated intraclass correlation coecients (ICCs) based on centre log ratio transformed abundances and used ALDEx2 to test for signicantly differential abundant taxa between the two sample types. Additionally, FIT and Stool Collection and Preservation Tube triplicate samples were obtained from volunteers to estimate variance components of microbial abundances. Results: FIT and Preservation Tube samples produce highly similar microbiome proles which cluster according to subject. Signicant differences between the two sample types can be found for abundances of some bacterial taxa (e.g. 33 genera) but are minor compared to the differences between the subjects. Analysis of triplicate samples revealed slightly worse repeatability of results for FIT than for Preservation Tube samples. Conclusions: Our ndings indicate that FIT cartridges are appropriate for gut microbiome analysis nested within CRC screening programs.


Introduction
Colorectal cancer (CRC) is the third leading cancer related cause of death worldwide and represents a major public health issue (1). The natural history of sporadic CRC usually involves slow progression from precancerous polyps to cancer, which offers opportunities for screening and early detection (2). Early detection of CRC is an important issue since stage at diagnosis remains the most important prognostic factor. As CRC is one of the most preventable cancers, population-wide screening programs are recommended in many countries. Screening programs have the potential to detect early precancerous lesions and perform endoscopic removal of adenomas, thereby contributing to the reduction of CRC incidence and mortality (3)(4)(5).
The screening project in Burgenland, a federal state of Austria, "Burgenland Prevention Trial of Colorectal Cancer Disease with Immunological Testing" (B-PREDICT) is an invited two stage screening project for individuals aged between 40 and 80 using a faecal immunochemical test (FIT) for initial screening.
Participants with a positive test are offered a diagnostic colonoscopy. These participants are asked to take part in our "Colorectal Cancer Study of Austria" (CORSA), sign a written informed consent, complete questionnaires and provide an EDTA blood sample and a stool sample for the CORSA biobank.
Nowadays, the preferred approach in testing for occult blood in faeces used for CRC screening programs is the FIT, despite its relatively low speci city and sensitivity. Therefore, there is an urgent demand for novel non-invasive biomarkers -in addition to FIT -to identify those individuals who are more likely to bene t from screening colonoscopy and those who need an earlier or more frequent colonoscopy. The combination of conventional screening methods such as FIT with microbiome-based methods could be a promising tool for early detection of CRC. There is some evidence of carcinogenic mechanisms induced by bacteria (6, 7) and therefore it has been hypothesized that the gut microbiome could play an important role in the development and progression of CRC. Speci c changes in the microbiome occur during different stages of colorectal neoplasia, from adenomatous adenomas to early stage cancer, to metastatic disease, supporting an etiologic and diagnostic role for the microbiome (8).
An important issue in microbiome studies is the sample collection methodology. Although, recent studies have demonstrated that microbial DNA isolated from FIT cartridges can replace naïve stool samples for microbiome analysis, there is little consent in standard faecal sample collection methods (9,10). The standardized sample collection methodology, particularly the feasibility of FIT samples for microbiome analyses within screening projects are currently intensively discussed in research networks and consortia focusing on gut microbiome-based biomarkers.
Therefore, we evaluated the microbial reliability, inter-as well as intra variability and usability of stool samples collected in FIT cartridges and Stool Collection and Preservation Tubes from participants of the screening program B-PREDICT as well as additional volunteer samples.

Methods
Questionnaires CORSA participants and volunteers provided a basic CORSA questionnaire assessing data on body mass index (BMI), smoking history, alcohol consumption, education level, family status, profession, basic dietary habits, information on use of antibiotics and diabetes.

Faecal sample collection
Participants were instructed to collect all stool samples at most three days prior to bowel cleanse and colonoscopy from the same bowel movement and to store them at room temperature until their clinical appointment. In the hospital all samples were frozen and stored at -80°C until DNA extraction. Two sample collection methods were used: FIT cartridges (Eiken Chemical Co., Ltd., Tokyo, Japan) and Stool Collection and Preservation Tubes (Norgen Biotek Corp., Ontario, Canada), henceforth referred to as FIT and Norgen, respectively.
In addition to participants recruited with the B-PREDICT screening, ve volunteers provided FIT cartridges as well as Norgen samples in triplicates. Volunteer samples were collected from the same bowel movement, stored three days on room temperature and frozen at -80°C until DNA extraction. DNA isolation DNA isolation is performed from FIT cartridge buffers and matching Norgen samples with the beadsbased QIAamp PowerFaecal Pro DNA Kit (Qiagen, Hilden, Germany) in combination with a Precellys® 24 homogenizer (VWR International GmbH, Vienna, Austria). The quality and quantity of the DNA is assessed prior to 16S rRNA sequencing using a NanoDropTM ND-1000 spectrophotometer (VWR International GmbH, Vienna, Austria) and uorometrically with the QubitTM dsDNA HS Assay Kit (ThermoFisher Scienti c, Vienna, Austria).
16s rRNA gene sequencing For the analysis of the bacterial microbiota, the variable V3-V4 region of the eubacterial 16S rDNA gene was ampli ed. The 16S small subunit ribosomal gene functions as an exclusive highly-conserved housekeeping gene which can be used to determine microbial communities within samples. Sample library preparation was performed according to the Illumina protocol (Illumina, San Diego, USA) followed by sequence analysis on the Illumina MiSeq platform.
Read pre-processing and taxonomic classi cation Reads were trimmed, ltered and denoised using dada2 (11). Optimal parameters for trimming were identi ed with Figaro (12). The taxonomy of the resulting amplicon sequence variants (ASVs) was classi ed using IDTAXA (13) with SILVA v138 (14). For classi cation on species rank only exact matches were used.

Statistical analysis
We performed our analyses based on ASVs, representing the highest possible resolution, as well as on various taxonomic ranks, representing different levels of aggregation. ASVs present in less than 5% of the analysed samples were excluded. Microbial abundances were transformed using the centre log ratio (CLR) transformation due to the compositional nature of microbiome datasets (15,16). The resulting values are scale invariant and therefore count normalization is unnecessary. Since this transformation cannot be calculated for count matrices containing 0-values, all 0s were imputed using the R package zCompositions applying multiplicative simple replacement (17).
Differences in sample characteristics between FIT and Norgen samples were visualized using violin plots, i.e. density plots displayed vertically like boxplots (18). Intra class coe cients (ICCs) (19) were calculated for all ASV abundances between FIT and Norgen samples of the patients (ICC (3,1)) and of the volunteers (ICC(3,k)). Consistency was chosen as the relationship considered to be important, since absolute deviations would not decrease the usability of FIT-originated data for risk prediction. However, ICCs for absolute agreement (ICC (2,1)) were calculated for a range of alpha and beta diversities (i.e. rst component of the resulting Principal Coordinates Analysis) as well as the ASV CLR abundances of the FIT triplicates and of the Norgen triplicates. Additionally, all the abundance-based ICCs were calculated for each taxonomic rank (species -phylum) in the same way as for the ASVs.
Calculating the Euclidean distance between two samples using the CLR values results in the Aitchison distance, which was used to perform a hierarchical clustering of all samples with Ward's clustering criterion (20). ALDEx2 (21) was used to identify signi cantly differently abundant ASVs and taxa between FIT and Norgen samples. P-values were corrected for multiple testing using the Benjamini-Hochberg method considering all tests performed at that speci c taxonomic rank as the total number of hypotheses. Effect sizes calculated by ALDEx2 were converted to standardized effect sizes (Cohen's d) (22,23).
The volunteer samples, which consist of triplicates of each sample type were used to calculate a linear model to identify the proportions of the sum of squares explained by the subject and the sample type for each ASV identi ed in at least three samples. The results were presented together with the sum of squared errors in a ternary plot (24). Additionally, separate linear models were tted to samples of each type to identify the sample-type-speci c proportion of variance explained by subject for each ASV present in at least two samples.

Study participants
Eighty-one participants recruited within B-PREDICT provided a FIT tube and a stool nucleic acid collection and preservation tube (Norgen). The median age of patients was 63.4 years and the median BMI was 27.6. Additionally, ve healthy volunteers were recruited with a median age of 30.8 years and a median BMI of 23.1 (Table 1).

Norgen and FIT samples produce similar numbers of reads and sequences
The denoised reads contained 6,097 ASVs. Taxonomic classi cation of all ASVs yielded 241 species, 240 genera, 80 families, 47 orders, 21 classes and 14 phyla, with varying proportions of reads classi ed at each taxonomic rank (Fig. S1). The median richness was 263 ASVs for FIT samples and 265 ASVs for Norgen samples (Fig. 1D) and the median number of reads after ltering was 60,283 for FIT and 59,266 for Norgen (Fig. 1E).
Of the identi ed ASVs 1,029 (16.9%) were detected in more than 5% of the samples. The median prevalence of these ASVs (i.e. percentage of samples in which an ASV was detected) was 11.5% in FIT samples and 12.5% in Norgen samples (Fig. 1F).

Average CLR abundances similar between FIT and Norgen
The average CLR abundances of ASVs display high similarity between the FIT and the Norgen samples (Fig. 1A). Among the ten ASVs with the highest differences between sample types, seven are more abundant in FIT samples. Of these, the highest differences can be observed for an ASV belonging to the genus Escherichia-Shigella of the Phylum Proteobacteria and the rest belong to the genera Enterococcus, Lactococcus, Streptococcus, Leuconostoc. Of the three ASVs with higher abundance in Norgen samples one belongs to the genus Oscillibacter and two could not be classi ed on the genus rank. Complete results, including the comparisons on each taxonomic rank are available in Table S1 and Fig. S1.

ICCs positively associated with abundance of ASVs
The ASV-speci c ICCs between the FIT and Norgen samples of the patients (Fig. 1B) display a positive association with the summed log abundances of the ASV. Low summed log abundances are in many cases accompanied by low ICCs and large con dence intervals. This indicates, that the estimates lack in precision for many of the rarer ASVs. Overall, the ICCs' rst quartile is 0.759, the median is 0.892, and the third quartile is 0.951. A common interpretation is, that an ICC higher than 0.75 indicates good reliability and an ICC higher 0.9 indicates excellent reliability (19). Fig. S1 provides visualizations of this analysis for taxonomic ranks from species to phylum and Table S2 contains complete ICC estimates and con dence intervals for all bacterial taxa and ASVs. These results con rm an association between abundances and reliability. Additionally, these results indicate higher reliability for higher ranks. This is probably due to the fact, that higher ranks result in deeper aggregation and higher proportions of classi ed reads. ICCs were also estimated based on the volunteer samples, which consisted of triplicates for each sample type. The "between FIT and Norgen" ICCs were therefore calculated based on the means of the respective samples. Additionally, this allowed for the estimation of the ICCs within the FIT and within the Norgen samples (Fig. S2). However, these were calculated as being obtained from three separate random raters (i.e. triplicate samples), resulting in lower and less stable estimates than the "between FIT and Norgen" ICCs, making a direct comparison of these results impossible. Nevertheless, this analysis shows that even separate stool samples of the same sample type and from the same subject contain noteworthy heterogeneity. The ICC estimates for the alpha and beta diversities and their con dence intervals can be seen in Fig. 1C. The Shannon, Simpson and Inverse Simpson indices all display ICCs above 0.75, with Shannon providing the highest reliability between FIT and Norgen. In the case of the beta diversities, the Bray-Curtis dissimilarity and the Jaccard index result in almost perfect agreement. Unweighted UniFrac also displays an excellent ICC, while the weighted version results in only good reliability.

Samples form subject-speci c clusters
The inter-subject distances (i.e. all possible distances between two samples from different subjects) displayed a median of 82.4, a maximum of 113.0 and a minimum of 50.1, which is higher than the maximum of all intra-subject distances, namely 41.2. The intra-subject distances consist of the distances between the FIT and the Norgen samples of each patient (1 distance per patient; median = 26.5) and each volunteer (9 distances per volunteer; median = 26.4) as well as the distances between the FIT (3 distances per volunteer; median = 25.4), respectively Norgen (3 distances per volunteer; median = 23.5) triplicates of each volunteer (Fig. 2A). The intra-volunteer distances were signi cantly different (Kruskall-Wallis test: p = < 0.001) and of the subsequent pairwise tests only the comparison between "FIT to Norgen" distances and "Norgen to Norgen" distances reached statistical signi cance (Wilcoxon test: p = < 0.001). Based on these distances a hierarchical clustering was performed on all samples. All samples clustered together according to the subject who provided them before being joined with samples of other subjects (Fig. 2B).

PCA of volunteer samples reveals no separability of FIT and Norgen samples
The rst four principal components of the ASVs CLR abundances in the volunteer samples are shown in Fig. 3B and reveal no sample type-speci c clusters. Only the samples recruited from volunteer no. 4 display some slight separability between FIT and Norgen samples. However, all other samples cluster randomly around a subject-speci c centre, regardless of the sample type.
Differential abundance detected at various taxonomic ranks Bacterial abundances of the patients' FIT and Norgen samples were compared at all taxonomic ranks (species to phylum) and the signi cant results are presented in Fig.   3A as a taxonomic tree. Some branches of the tree display consistent differences between the sample types. For example, all signi cantly differentially abundant taxa belonging to the phylum Actinobacteriota or the class Bacilli are more abundant in FIT samples, while all the signi cant taxa belonging to the class Bacteroidales are more abundant in the Norgen samples. However, there are also inconsistent branches, like the family of Lachnospiraceae, which contain both, genera more abundant in FIT and genera more abundant in Norgen samples. Complete results of the ALDEx2 analysis are available in Table S3. Table 2 List of full names for taxa displayed in Fig. 3A.
Effect sizes were calculated with ALDEx2 and are only shown for signi cant (after p-value correction) differences. Negative values indicate higher abundances in FIT samples, while positive values correspond to higher abundances in Norgen samples.

Sample type explains only small proportion of sum of squares
Linear models were tted on the CLR abundances of the volunteer samples for all ASVs detected in at least 3 of the 30 samples. The resulting proportions of sum of squares explained by subject and sample type as well as the residual proportions are shown in a ternary plot in Fig. 4A and the corresponding boxplots in Fig. 4B. This shows, that most of the variance in the ASVs' CLR abundances can be explained by the subject compared to only small amounts which are explained by the sample type. Some ASVs display a high proportion of residual variance, which overall constitutes a much bigger issue for the repeatability of results. This is also evident from the results of separate models for FIT and Norgen using ASVs detected in at least three samples of the respective type (Fig. 4C). This model speci cation shows, that amounts of variance explained by subject are slightly lower (i.e. residual variance is higher) for FIT than for Norgen. For both sample types, there is a peak at proportions near 1, which is slightly less pronounced for FIT and corresponds to a lower mean of 0.930 for FIT, compared to 0.936 for Norgen.

Discussion
Several CRC screening programs such as B-PREDICT implemented a two-stage screening, using FIT for the initial screening. The combination of conventional screening methods such as FIT with microbiomebased methods could be a promising tool for optimizing early detection of CRC. To investigate the usability of FITs for gut microbiome analysis we compared FIT samples as well as stool samples collected in conventional Preservation Tubes (Norgen) from participants of the B-PREDICT screening program and additional volunteers.
Our ndings are mostly in accordance with previous published studies. Sinha et al. (7) demonstrated in their study comprising 20 volunteers that the Faecal Occult Blood Test (FOBT) is a reasonable sample collection method with optimal stability and reproducibility for 16S rRNA microbiome pro ling. The recent published study "Comparison of fecal sample collection methods for microbial analysis embedded within colorectal cancer screening programs" (25) found that microbial data obtained from FIT cartridges and specimen collection cards are stable and may be appropriate methods to collect faecal samples for gut microbiome analysis in population-based cohort studies.
Our results show, that FIT and Norgen samples differ mainly in two speci c attributes. Norgen samples display a lower residual variance, i.e. higher repeatability. We have shown, that the median FIT to FIT distance is 8.0% higher than the median Norgen to Norgen distance, representing the increase in unaccounted variation across the complete microbiome pro le. Furthermore, there are differences in abundances of several taxa due to sample type. Although the overall effect of the sample type on the microbiome pro le is only slight, signi cant differences between FIT and Norgen were detected for some taxa within B-PREDICT participants. These results are supported by the analysis of triplicate volunteer samples. However, it is also evident that even for the ASVs affected by the sample type, the resulting microbial abundance is much more strongly in uenced by the subject. Subject-speci c agreement of ASV-abundances is only slightly affected by sample type and clearly more negatively affected by residual variance, which probably arises due to issues like zero-in ation (29) and false-positive detection, which impact low-abundance taxa more strongly and are inherent to microbiome analysis. Generally, taxa with low abundances are associated with lower agreement and lower ICCs. Therefore, increasing the taxonomic rank on which an analysis is performed (i.e. from genus to family) leads to results indicating higher reliability.
A limitation of our study is that no homogenization of sample material during sampling was performed, thereby inevitably introducing variation into samples from the same subject. To asses a baseline of this variation, triplicate samples were obtained from volunteers and incorporated into the analysis. However, FIT samples analysed in the present study were obtained in course of the regular B-PREDICT process representing a usual sampling procedure within a CRC screening.
Our ndings, taken together with previous studies, demonstrate that FIT samples are feasible for gut microbiome (16S rRNA) analysis nested within CRC screening programs.

Conclusion
Gut microbiota pro ling may be a promising tool to optimize current CRC screening programs. However, validation in larger studies as well as association studies, linking microbiome pro les and clinical outcomes, are warranted.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was funded by the "Österreichische Forschungsförderungsgesellschaft"FFG BRIDGE (grant 880626, to Andrea Gsur) and was supported by COST Action CA17118.
Authors' contributions SB, MB and AG designed and coordinated the study; AG supervised this study. AG together with BS and NG received funding to conduct the study. FB, PG, TG, MH, GL and RL collected samples and coordinated sample recruitment; SB carried out DNA isolation and sample preparation. SB, MB, AB, AF, CJ and AG participated in data analysis and result interpretation; MB prepared gures and tables; SB, MB and AG drafted the manuscript; All authors have read and approved the manuscript.

Figure 2
A: Boxplots of distances between samples of different subjects (regardless of sample type), between FIT and Norgen samples of the same patient and between FIT and Norgen samples, between FIT samples and between Norgen samples of the same volunteer. B: Result of a hierarchical clustering algorithm based on the distances between all samples. Samples originating from the same subject are connected with coloured bars.

Figure 3
Page 19/20 A: Taxonomic tree displaying signi cant differences between FIT and Norgen samples based on the ALDEX analysis. Taxa are labelled with an ID and the rst letters of their name. Full taxa names are given in Table 2. B: Scatterplots of the rst four principal components extracted from the volunteer samples.
Each of the ve volunteers is represented by a number. Figure 4